Since the computer has conquered our life and became an essential need rather than an accessory, more-sophisticated human-computer interaction, beyond the traditional keyboard, mouse, and monitor, is aimed to enable the users to interact with computers more socially. Emotional interaction play a major role in social life, thus the affective humanrobot interaction has evolved significantly throughout the last decades. The aim of this thesis is to provide the ability of emotion understanding for a robot. Throughout the thesis, a discrete theory of emotions is used as a frame of reference. According to it, emotions can be classified into some basic emotion classes.
The research is orginized around two goals. The first goal is to enable a robot to infer the emotional state of its interaction partner by analysing the displayed facial expression in non-constrained conditions. To achieve that, a robust, fully automatic, non-invasive, and real-time applicable vision-based system is developed with the ability to be implemented in the robot.
As the aim is to enable the robot to interact with its interactant in eal-world scenarios, sitautions in which the user is engaged in conversational sessions present farther challange for such systems. The second goal of this work is to combine facial expression and speech information cues in such a way, as to enable the affective system of the robot to fit such situations. En route to this goal possible affects of facial configuration related to speech on inferring emotions from facial expression is investigated. The results suggest a degraded performance when facial expressions are displyed during speech as displying them deliberately. In order to smooth this effect, information of audio signal is taken into account. The performance of the emotion recognition system is relatively enhanced by fusing facial expression cues and speech information ones into a bimodal system. The perfomance of the bimodal system still, however, degraded comparing with the perfomance stand-alone facial expression analyis system in the case of displayning facial expression deliberately.
Finally, the extent of recognizing each emotion by utilizng each modality is invistagted. The results indicate a highly varying performance of each modality ith the respective emotion class, and for the bimodal system, each modality should be weighted according to its discriminative power for a specific emotion.