Information extraction from text for deep domain knowledge graph population. Extracting  pre-clinical outcomes in the domain of spinal cord injury

ter Horst, Hendrik Roman

Titelaufnahme

Titel
Information extraction from text for deep domain knowledge graph population. Extracting pre-clinical outcomes in the domain of spinal cord injury
Verfasser
ter Horst, Hendrik Roman
Gutachter
Cimiano, Philipp
Erschienen
2021
Sprache
Englisch
Dokumenttyp
Dissertation
URN
urn:nbn:de:0070-pub-29598139
DOI
10.4119/unibi/2959813

Zugriffsbeschränkung

Das Dokument ist frei verfügbar

Links

Social Media

Share
Nachweis
Kein Nachweis verfügbar
IIIF
IIIF-Manifest

Dateien

Information extraction from text for deep domain knowledge graph population. Extracting pre-clinical outcomes in the domain of spinal cord injury [pdf 3.13 mb]
RIS

Klassifikation

Klassifikation (DDC) → Informatik, Informationswissenschaft, allgemeine Werke → Informatik, Wissen, Systeme → Informatik, Informationswissenschaft, allgemeine Werke

Abstract

Every year, a vast amount of unstructured medical knowledge is described in thousands
of pre-clinical studies published on publicly available websites such as PubMed. The
aggregation of such knowledge plays an important role in various medical applications
such as therapy development in evidence-based medicine where decisions are made on
the basis of the best available evidence published in the literature so far. However, due
to their natural language format, the manual aggregation of available information is
tedious and time-consuming and can hardly be performed by researchers. Towards this
issue, we are concerned with the automatic information extraction of structured knowledge at a level of detail that supports evidence-based decision making. Specifically, we
focus on automatically populating a deep domain knowledge graph with information
from pre-clinical studies that describe experimental results in the area of spinal cord
injury. An important challenge is that a single study contains multiple outcomes described by a total of up to 7,816 (dependent) study parameters. Since the problem of
extracting all these parameters jointly is so far intractable, we propose a hierarchical
architecture that predicts incrementally feasible substructures in a bottom-up fashion
relying on statistical inference and conditional random fields at the heart of our system.
The main contribution of this work is the development of a machine learning methods
integrated into a holistic domain-adapted information extraction system that is capable of predicting the full details of experimental outcomes as described in pre-clinical
studies written in natural language. We present a general methodology for the extraction of deeply nested structures rooted in the paradigm of structure prediction and
model-complete text comprehension. We further identify domain specific challenges, and
provide adapted solutions. We show how to efficiently evaluate complex nested structures predicted by our system and present a comprehensive evaluation to understand
the extent to which it can be used with the depth required to support aggregation of
evidence. We show that the information extraction results are satisfactory for many
classes of our domain ontology and identify those which require further research.

Inhalt

Inhalt des Werkes

Statistik

Das PDF-Dokument wurde 4 mal heruntergeladen.

Lizenz-/Rechtehinweis

Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International Lizenz

Detailsuche

Bibliotheken

Projekt

Impressum

Datenschutz

Titelaufnahme