Nowadays, many people are expecting an age of personal robots just as what happened in the evolution of computers. With this background, the research on human-robot interaction receives a lot of attention in the robotics research community. In this dissertation, we focus on the vision-based recognition of human's manipulative gestures because the manipulation of objects draws the attention of the communication partner on the objects that are relevant for a performed task and furthermore the recognition of them serves the goal of a more pro-active behavior of the robot in passive, more observational situations.
Comparing to the interpretation of communicative gestures, which can be recognized purely based on trajectory information, the understanding of manipulative gestures is more relying on the object contexts. Different to others, the approach we propose is called object-oriented w.r.t. three different aspects: it is object-centered in terms of trajectory features that are defined relative to an object, it uses object-specific models for action primitives, and it has an object-attention mechanism which is based on task models.
While most of the related work in gesture recognition assumes a fixed static camera view, such kind of constraints do not apply for mobile robot companions. After an analysis of the observational scenario, a 2-D approach was chosen by us. The manipulative primitive recognition scheme is able to generalize a primitive model, which has been learned from data items observed from a single camera view, to variant view points and different settings. We tackle the problem of compensating the view dependence of 2-D motion models on three different levels. Firstly, the trajectories are pre-segmented based on an object vicinity that depends on the camera tilt and object detections. Secondly, an interactive feature vector is designed to represent the relative movements between the human hand and the objects. Thirdly, a particle filter realized matching method adaptively finds out a scaling parameter which can fit the HMM-based models to different view angles.
To cope with different layers of intentions in the manipulative gestures, a unified graphical model with a two-layered recognition structure is proposed. The object-specific manipulative primitives on the lower level are coupled with task-specific Markovian models on the upper level. The combined bottom-up top-down processing loop in this structure realizes a dynamic attention mechanism by utilizing the task-level prediction of possible primitives to restrict the object types possibly detected as well as the action primitives possibly recognized. In this thesis, an online task-learning strategy based on pre-learned object-specific manipulative primitives is also proposed. The task model can be initialized with few labeled data and updated incrementally when new unlabeled data becomes available. The results of experiments in an office environment show the applicability of the approaches for vision-based manipulative gesture recognition put forward in this thesis.