The task of answering natural language questions over RDF data has
received wIde interest in recent years, in particular in the context of the series
of QALD benchmarks. The task consists of mapping a natural language question
to an executable form, e.g. SPARQL, so that answers from a given KB can
be extracted. So far, most systems proposed are i) monolingual and ii) rely on
a set of hard-coded rules to interpret questions and map them into a SPARQL
query. We present the first multilingual QALD pipeline that induces a model
from training data for mapping a natural language question into logical form as
probabilistic inference. In particular, our approach learns to map universal syntactic
dependency representations to a language-independent logical form based
on DUDES (Dependency-based Underspecified Discourse Representation Structures)
that are then mapped to a SPARQL query as a deterministic second step.
Our model builds on factor graphs that rely on features extracted from the dependency
graph and corresponding semantic representations.We rely on approximate
inference techniques, Markov Chain Monte Carlo methods in particular, as well
as Sample Rank to update parameters using a ranking objective. Our focus lies on
developing methods that overcome the lexical gap and present a novel combination
of machine translation and word embedding approaches for this purpose. As
a proof of concept for our approach, we evaluate our approach on the QALD-6
datasets for English, German & Spanish.