Feature selection is a widely used strategy in machine learning
for the reduction of feature sets to their relevant essence to
improve predictions and performance. It is also employed for
knowledge discovery in applied disciplines such as biology and
medicine to find potentially causal factors. But machine learning
models often do not represent a unique solution to a given problem,
especially in high dimensional settings where redundant
factors are likely and spurious correlations exist.<br /><br />
Basing decisions about causal elements on feature selection is
therefore inaccurate or wrong when not considering the presence
of redundant but also relevant features. Most existing selection
algorithms are specifically removing redundancies and not suitable
for the task of all-relevant feature selection, or they require
careful parametrization and are hard to interpret, which makes
them difficult to use.<br /><br />
This thesis is focused on feature selection methods for the
analytical use case to facilitate understanding of potential causal
factors, for linear and non-linear problems. We propose several
new algorithms and methods for all-relevant feature selection
to improve knowledge discovery, enabled by statistical methods
to improve the accuracy of existing solutions and allow the differentiation
between different types of relevance. Furthermore,
we offer a new heuristic to automatically group related features
together, and we analyse the definition of relevance in the context
of privileged information, where data is only available in
training.<br /><br />
We also introduce software implementations, which were
specifically designed to be modular, efficient and able to parallelize
for applications in high dimensional problems. The methods
and implementations were evaluated on a wide range of
synthetic and real datasets to show their performance in comparison
with existing algorithms.