The question addressed in this thesis is how to make use of gestures in a multimodal human-machine interaction. To interact with current computer systems the human still has to adopt himself to the interfaces of the system. The underlying vision of this work is to enable computer systems to interpret the typical human forms of interaction.
Based on this question and vision the goal of this work was developed: Build a robust recognition of human gestures for a multimodal system. Thus the interactive competences of the system can be broadend and improved.
Focussing this goal this work examines the theoretic basis of the human-machine interaction as well as the practical implementation and evaluation of an automatic recognition of gestures. The guideline for the implementation is a flexible and modular architecture. This work presents novel approaches in taking the symbolic and situational context of gesture into account. Continuative work in deriving humans intent of gestures is depicted.
This work concentrates on the interactional behaviour and gestures of humans, thus mostly deictic and manipulative gestures are considered. Their context is the main aspect of these gestures, it allows reasoning about their meaning and intention.
As mentioned above the gesture recognition system is intended to be part of a multimodal system. This integration is shown by the successful use of the developed system within a mobile, social, and multimodal robot and within an wearable assistant system.
Human-human communication and the field of gestures are in the focus of research for many years. Hence this work gives an overview on communication theory and psychological research relevant for the human-machine interaction. A young and challenging aspect is the research on the topic: "How do children learn the manipulation of objects from their parents." The theory developed by Brand et al. [2002] is the basis for an automatic analysis of parents behaviour presented in this work.
This work is a contribution to the vision of a natural human-machine interaction. The emphasis lies on the video-based recognition of human actions and gestures.