Recent developments in motion capture technologies such as Microsoft Kinect and Vicon systems have facilitated motion data acquisition in diverse areas such as entertainment, sports, medical applications, or security systems. This type of data typically consists of recorded body parts’ movement through time, which describes a semantically meaningful action to the domain experts. Coinciding notable growth in the size of available motion datasets, it is necessary to design machine learning methods to analyze motions regarding their underlying characteristics systematically.
Although many approaches have been suggested for motion analysis ranging from component analysis methods to deep learning algorithms, the majority of the state-of-the-art designs lack in providing semantically interpretable models. Such particular models for motion data contain building blocks that are connected to commonalities and particularities semantically understandable by domain experts. In
this dissertation, I propose efficient algorithms to address the interpretable analysis of motion data from several significant aspects. These algorithms contribute to the state-of-the-art by introducing interpretable models in four specific categories of metric learning, sparse embedding, feature selection, and deep learning for the purpose of motion data analysis.
I propose a novel metric learning algorithm for motion data that benefits from a flexible time-series alignment. This algorithm can transfer motion data to another space in which semantically similar motions are located in tighter neighborhoods while semantically different motions are pushed further away from each other. A post-processing regularization of the learned metric reduces the usual existing correlations between the dimensions of the motion. As a result, the proposed model is interpretable by providing a small subset of dimensions (joints) that are closely relevant to the given discriminative task.
Furthermore, I present novel embedding frameworks that transfer the raw motion representation to a vector space. The resulting embeddings are non-negative vectorial representations that are sparse and semantically interpretable. They specifically carry understandable information about the encoded motions, such as the particularities of motion classes or commonalities of different motions. Additionally, I extend my proposed metric learning and embedding algorithms to different feature selection frameworks. In each framework, a sparse set of motion dimensions is selected that are semantically connected to the given overarching objective.
The last designed framework in my Ph.D. project focuses on using convolutional neural networks (CNN) to perform sequence-based labeling on motion data. More specifically, my developed deep learning algorithm introduces a novel CNN-based architecture benefiting from the time-series alignment concept in its filters. This framework learns local patterns in the temporal dimensions of the data. These temporal patterns are interpreted as significant parts of motion sequences, which
lead to better discrimination of them.
I implement the above frameworks on different real-world benchmarks of motion
data and analyze their performance from the above-discussed perspectives by
comparing them to relevant state-of-the-art baselines.