The gradually accumulated knowledge of molecular interaction is assembled into biological network to show the global picture of biological system. The biological network construction is usually based on the data from biological databases or literature. Once a specialized or less investigated biological network is focused, the issue of data scarcity in the database and literature emerges.
Redox regulatory network sustains the redox homeostasis in the cell, and its capacity has impact on the functionality of its target protein. One critical step for extending the redox regulatory network is the identification of target protein of thioredoxin (Trx)/glutaredoxin (Grx). However, the redox regulatory network has been better explored in plants than in animal. When the specialized topic, such as the construction of redox regulatory network in human mitochondrion which this thesis is tackling, is focused, little information can be obtained through conventional methods of network construction, such as querying the biological databases or mining of literatures.
To overcome the data deficiency problem of the specialized topic, a bottom-up strategy is adopted to first identify the oxidation susceptible cysteine, which is an important feature for the chemical reaction mechanism between Trx/Grx and their target protein. In the first part of the thesis, a pre-selection tool for Trx/Grx target protein, termed ROCD, is implemented following a computational decision tree discovered from the study of physicochemical properties. ROCD pre-selected a group of proteins which contains the potential candidate and requires further validation. One of the validation methods for the computational prediction is through search for relevant literature. And again, owing to the same information deficiency issue from the specialized research topic, the directly relevant literature is missing most of the time. The second part of the thesis introduces a network-contexted document retrieval system, termed ncDocReSy, to assist the retrieval of indirectly relevant literature based on the topology of biological network. ROCD is applied on the pre-selection of Trx/Grx target protein in the mitochondrion of human liver with the physicochemical values suggested from other study and results in 309 potential candidates. After the pre-selection step, ncDocReSy can be used in the process of manual curation of the pre-selection result by providing indirectly relevant literature.
In this thesis work, several bioinformatics facilities assisting resource integration were used, such as the ID mapping service and standard data exchange formats. These facilities help the communication and mutual understanding between different resources and are essential for the integrative usage of bioinformatics resources.