TY - THES AB - The extraction of disease-gene associations from biomedical publications is a widely inves- tigated field of research. In previous work, a frequent method was to implement natural language processing tools that use semantic information to find such associations. How- ever, most of these approaches are restricted to single documents. Retrieval systems that predict novel associations across various documents often lack the ability to deal with the huge amount of resulting candidates. In this work, we present a system that aggregates information from a large corpora of scientific abstracts. This information is used to build a comprehensive gene-interaction network, which is then used to predict novel disease-gene associations. We tackle the problem of candidate reduction by integrating two separate machine learning methods. We train a support vector machine to classify genes as disease related or not and a support vector regression model to rank gene-candidates according to their importance to a specific disease. Thereto, we make use of approved methods and ex- tend them by a novel investigation of the gene-interaction network. In a model-evaluation on two gold standards as well as in a case-study in cooperation with biomedical experts, it is shown that the proposed methods are able to extract disease-gene-associations from single documents and discover disease-related candidates across multiple documents. DA - 2015 KW - machine learning KW - text mining KW - biomedical literature KW - graph-based features KW - disease-gene associations LA - eng PY - 2015 TI - Ranking of disease gene associations from large corpora of scientific publications UR - https://nbn-resolving.org/urn:nbn:de:hbz:361-27767490 Y2 - 2024-11-23T12:46:14 ER -