Generation of multilingual ontology lexica with M-ATOLL :  a corpus-based approach for the induction of ontology lexica

Walter, Sebastian

There is an increasing interest in providing common web users with access to structured knowledge bases such as DBpedia, for example by means of question answering systems.<br />
All such question answering systems have in common that they have to map a natural language input, be it spoken or written, to a formal representation in order to extract the correct answer from the target knowledge base.
This is also the case for systems which generate natural language text from a given knowledge base.
The main challenge is how to map natural language (spoken or written) to structured data and vice versa.
To this end, question answering systems require knowledge about how the vocabulary elements used in the available datasets are verbalized in natural language, covering different verbalization variants.
Multilinguality of course increases the complexity of this challenge.<br />
In this thesis we introduce M-ATOLL, a framework for automatically inducing ontology lexica in multiple languages, to find such verbalization variants.<br />
We have instantiated the system for three languages, English, German and Spanish, by exploiting a set of language-specific dependency patterns for finding lexicalizations in text corpora. Additionally, we extended our framework to extract complex adjective lexicalizations with a machine-learning-based approach.<br />
M-ATOLL is the first open-source and multilingual approach for the generation of ontology lexica. In this thesis we present grammatical patterns for three different languages, on which the extraction of lexicalization relies.
We provide an analysis of these patterns as well as a comparison with those proposed by other state-of-the-art systems. Additionally, we present a detailed evaluation comparing the different approaches with different settings on a publicly available goldstandard, and discuss their potential and limitations.

Detailsuche

Bibliotheken

Projekt

Impressum

Datenschutz

Titelaufnahme