Today, industrial production processes in car manufacturing worldwide are characterised by either fully automated production sequences carried out solely by industrial robots or fully manual assembly steps where only humans work together on the same task. Up to now, close collaboration between humans and machines, especially industrial robots, is very limited and usually not possible due to safety concerns. Industrial production processes can increase efficiency by establishing a close collaboration of humans and machines exploiting their unique capabilities.
This thesis describes computer vision and pattern recognition methods to allow a safe interaction between human workers and industrial robots in a production environment. Vision methods are required for marker-less 3D pose estimation and tracking of the motion of human body parts and robot parts. With a motion analysis the vision based safety system is able to slow down or stop the robot early enough to avoid potentially hazardous situations. To have reliable depth information of the investigated scene, we use a small-baseline trinocular camera system similar to the SafetyEYE protection system (www.safetyeye.com).
Based on the example of tracking the human hand-forearm limb and the head-shoulder area it is shown that the developed 3D pose estimation and tracking algorithms allow for a robust, temporally stable, and metric accurate system performance. The developed 3D pose estimation algorithms are either model-based top-down techniques which directly use the three synchronously acquired images or bottom-up approaches which rely on motion-attributed 3D~point clouds that are computed from the observed scene.
The Multiocular Contracting Curve Density (MOCCD) algorithm is a known top-down pose estimation technique based on pixel statistics around a contour model projected into the images from several cameras. The MOCCD algorithm is applied to track the 3D pose and some deformation parameters of the hand-forearm limb and the head-shoulder area with a traditional Kalman Filter framework.
In a second system the newly developed Shape Flow algorithm, a temporal extension of the MOCCD approach, is used to replace the Kalman Filter framework. The Shape Flow algorithm uses a spatio-temporal model of the tracked object and is able to obtain an estimate of the instantaneous motion properties, thus no temporal filtering is required. Obtaining instantaneous motion information is of essential importance in the context of safe human-robot interaction, such that the proposed method do not apply temporal filtering -- the latency time of a temporal filtering stage would result in delays unacceptable in our application scenario. The developed system allows for a computation of a top-down model-based 3D scene flow using directly the three synchronously acquired images from the small-baseline trinocular camera.
Another tracking system combines the MOCCD technique and a bottom-up approach which uses a motion-attributed 3D point cloud to estimate the object pose with the Iterative Closest Point (ICP) algorithm. Because of the orthogonal properties of both approaches it is shown that a fusion of the pose estimates is favourable. The tracking is realised by a motion analysis which is based on motion-attributed 3D points belonging to the tracked object using an extended constraint-line approach. A further refinement of the obtained motion properties is done using the Shape Flow algorithm, no temporal filtering is applied.
The idea behind another newly developed tracking approach is to extract the motion of all moving objects in the observed scene with a 3D mean-shift tracking algorithm and a simple ellipsoid model. A graph based clustering stage extracts all moving objects from motion-attributed 3D~point cloud of the observed scene. The proposed 3D mean-shift tracking approach in this thesis relies on hue or gray value histograms and motion-attributed 3D point cloud data. At each time step the newly developed recognition stage determines the relevant object (e.g. the hand) which performs the working actions.
To understand the behaviour of the human worker, consistently 3D information from the tracking stage is used. The newly developed recognition system is able to extract an object performing a known action out of a multitude of tracked objects. A sequence of working actions is recognised with a particle filter based non-stationary Hidden Markov Model framework, relying on the spatial context and a classification of the observed 3D~trajectories using the Levenshtein Distance on Trajectories as a measure for the similarity between the observed trajectories and a set of reference trajectories. Based on an engine assembly scenario it is shown that the system achieves action-specific recognition rates of more than 90% and that the system is able to detect disturbances, i.e. interruptions of the sequence of working actions, by entering a safety mode, and it returns to the regular mode as soon as the working actions continue.
All experimental investigations in this thesis are performed on real-world image sequences displaying several test persons performing different working actions typically occurring in an industrial production scenario. In all example scenes, the background is cluttered, and the test persons wear various kinds of clothes. For evaluation, independently obtained ground truth data is used.