This thesis regards the analysis of position, orientation und motion of rigid and articulated objects from stereo image sequences. In a first step, a spatio-temporal scene reconstruction based on stereo image analysis and optical flow computation on the binocular or trinocular image sequences is performed. Subsequently a segmentation stage is used to seperate the object from each other and from the background.
Different newly developed pose-estimation techniques are utilized, e.g. problem-specific versions of the iterative closest point algorithm which is used to determine the object pose (position and orientation) based on the 3D point cloud using model information. Alternatively, a novel model-based stereo analysis or a novel fusion of contour-based and point-based pose estimation is applied.
Several new methods for motion analysis can be utilized to determine the temporal derivative of the pose, i.e. the object motion, instantaneously. Here the complete three-dimensional motion is determined based on optical flow data, contour-based or by using a model-based scene-flow technique.
Several system setups are presented which robustly estimate the pose and the motion of rigid or articulated objets with high accuracy. To demonstrate the usability of the approach for different scenarios the system is evaluated in the industrial production scenario and in the traffic scenario at road intersections. In the production scenario the human hand-forearm-limb is regarded as an articulated object, whereas in the traffic scenario vehicles are regarded as rigid objects.
The evaluation is based on real-world sequences and shows high accuracies for the newly developed determination of the position and orientation as well as for the novel motion analysis using independently determined ground-truth data. Furthermore, it is shown that the developed fusion approach is useful for small objects in the production scenario, because a combination of contour-based and point-based algorithms increases the robustness of the system significantly. For the traffic scenario a use of 3D points alone shows high accuracies, while the model-based stereo technique achieves a further increase of the accuracies. For the complete motion analysis in the traffic scenario, the model-based scene flow method reaches high accuracy, whereas in the production scenario a contour-based approach or an extented method for the analysis of optical flow fields is favourably applied.
One main focus of the system is the integration of decision feedback in the processing hierarchy, leading to an increased robustness of the system. By using decision feedback, stereo matching errors can be avoided or a classification of articulated objects can be done, which allows a verification of the object detection. Another advantage of the system is the instantaneous determination of the object motion, where just two or three time steps are used. A temporal filtering of the object pose is thus avoided, which would result in a delayed reaction of the system.