TY  - THES
AB  - The extraction of disease-gene associations from biomedical publications is a widely inves-
tigated field of research. In previous work, a frequent method was to implement natural
language processing tools that use semantic information to find such associations. How-
ever, most of these approaches are restricted to single documents. Retrieval systems that
predict novel associations across various documents often lack the ability to deal with the
huge amount of resulting candidates. In this work, we present a system that aggregates
information from a large corpora of scientific abstracts. This information is used to build a
comprehensive gene-interaction network, which is then used to predict novel disease-gene
associations. We tackle the problem of candidate reduction by integrating two separate
machine learning methods. We train a support vector machine to classify genes as disease
related or not and a support vector regression model to rank gene-candidates according to
their importance to a specific disease. Thereto, we make use of approved methods and ex-
tend them by a novel investigation of the gene-interaction network. In a model-evaluation
on two gold standards as well as in a case-study in cooperation with biomedical experts,
it is shown that the proposed methods are able to extract disease-gene-associations from
single documents and discover disease-related candidates across multiple documents.
DA  - 2015
KW  - machine learning
KW  - text mining
KW  - biomedical literature
KW  - graph-based features
KW  - disease-gene associations
LA  - eng
PY  - 2015
TI  - Ranking of disease gene associations from large corpora of scientific publications
UR  - https://nbn-resolving.org/urn:nbn:de:hbz:361-27767490
Y2  - 2024-11-23T12:46:14
ER  -