Goals

The general goals of the project are as follows:

  • To create a platform and a collaborative process for distributed development of multilingual special domain terminologies suitable for both human and machine use.
  • To develop a wide range of methods to leverage information aggregated across multiple dimensions to boost the quality of the analysis of text materials.
  • To develop tools for statistical extraction of ontological structures or conceptual hierarchies based on the automatically generated keyphrase collections.
  • To develop and apply distributed ontology techniques to create a common semantics in a given business case.

More concrete goals of each workpackage are described below.

TermFactory workpackage (TF WP)

The vision of the TermFactory WP is that access to multilingual resources and the collaborative management of such resources, by the community itself, becomes part of the working day of everybody active in global organizations. This vision is made possible with an increasing unobtrusive integration of intelligent language management software in the everyday working environment.

The immediate mission of the TermFactory WP is to create a platform and a collaborative process for distributed development of multilingual special domain terminologies suitable for both human and machine use.

The main scientific innovation of the TermFactory WP in multilingual terminology management is to bring together the following desiderata:

  • global scale multilinguality
  • automatic reasoning and natural language processing
  • multi-source, multimedial, multi-channel content
  • collaborative distributed terminology management

Information to Intelligence workpackage (I2I WP)

A central goal of the I2I WP is to develop a wide range of methods to leverage information aggregated across multiple dimensions to boost the quality of the analysis of text materials.This includes aggregation:

  • across extended periods of time,
  • across multiple sources, and
  • across multiple languages

Each of these tasks poses research challenges of its own. Aggregation is essential for building a model of “background knowledge” in the domain of inquiry. Background knowledge, in turn, enables a variety of robust methods of analysis of a new incoming piece of text – methods which are not possible in the traditional approach to information extraction, where each piece of text is treated as a separate, independent entity.

These methods include identifying new vs. old information, where the system must know whether something has been already encountered at an earlier time or in a different source. Likewise, another essential function is linking multiple mentions of the same fact across time, multiple sources and languages into a coherent whole. Background knowledge is coded in ontologies, in accordance with principles of best practice developed in the TF WP.

The main scientific contributions are expected to be novel developments in automatic information extraction and aggregation, and exploitation of the synergistic approach.

Multilingual terminology and ontology learning (MuTOL WP)

The goal of MuTOL workpackage is to develop methodology for automatic
language-independent terminology extraction and ontology learning
based on the extracted terms. We have earlier developed a keyphrase
extraction method Likey that produces keyphrases for documents. In
keyphrase extraction, keyphrases are supposed to be available in the
processed documents themselves, and the aim is to extract these most
meaningful words and phrases from the documents. On the contrary, in
keyphrase assignment, all potential keyphrases appear in a predefined
vocabulary and the task is to classify documents to different
keyphrase classes.

Likey extracts keyphrases using relative ranks of n-gram frequencies.
It is a language-independent method: the only language specific
component is a reference corpus in the corresponding language. The
preliminary results of using Likey method have been good and the
method has been presented in Coling 2008 conference.

The basic objectives in this WP are:

  • to conduct further evaluation studies with corpora from the medical and ICT domain, and
  • to statistically extract ontological structures or conceptual hierarchies based on the automatically generated keyphrase collections.

Semantic interoperability support (SIS WP)

Collaborative use of electronic business services is important for enterprises, due to competition between business networks and value nets. No longer limited to large enterprises, networked business now affects SMEs which collaborate to compete in fields dominated by larger companies, in the form of loosely-coupled service federations of autonomous actors.

Infrastructure for the federated management of such collaborations requires interlinked ontologies for business network models, service types, service offers, and terms used to describe their properties. The ontologies need to be designed by collaborating experts, who need distributed system support.

The lack of shared meaning prevents businesses from adopting electronic B2B collaboration. The objective of the the SIS WP is to describe a business case where industrial partners engage in B2B collaboration and to develop and apply CF distributed ontology techniques to create a common semantics in the given business case.

Latest news

25.5.09 - The ContentFactory website is up and running. It consists of a public WordPress site for general information on the project and public news, maintained by University of Helsinki Palmenia unit at Kouvola, plus a project internal TWiki site for day-to-day project business and internal reporting, maintained by the University of Helsinki Department of Computer Science in Kumpula.

Archives