Smart homes are one of the most emergent research fields and provide
fundamentally new means of interaction. So-called Smart Personal
Assistants (SPAs) entered the household and assist us in our daily
activities. Currently, these agents do not react to the attention of
the smart home user. However, from Human-Human Interaction (HHI)
research we know that humans coordinate their speech and adapt their
behavior continuously, based on their interaction partner’s actions and
reactions. Therefore, the central question I ask in this dissertation is
how human attention can be incorporated into dialogue management,
to improve Human-Agent Interaction (HAI) in smart homes.
Research shows that speakers’ hesitations are often produced as a
reaction to the listener’s inattentiveness in HHI. Furthermore, they
can improve the listeners’ comprehension. Therefore, I investigate
whether it is possible to use system hesitations, based on the attention
of the human interaction partner, as a communicative act for dialogue
coordination in HAI within a smart-home environment. To this end,
I develop a theoretical model based on observations from HHI, implement it in an autonomous agent and evaluate it in five interaction studies.
This document consists of three parts. In the first part, I develop a
model which allows the dialogue management to incorporate the human attention: the Attention-Hesitation Model (AHM). The model uses
system hesitations as a non-intrusive intervention strategy to coordinate
the human attention with system speech. This theoretical model is
based on interdisciplinary literature from HHI and HAI research.
In the second part, I elaborate on the technical requirements implied
by the integration of the AHM in an autonomous system. A technical
realization of an incremental dialogue system in presented. Two main
concepts for dialogue modeling are identified: (1) the use of interaction
patterns with system task descriptions for generalizability and (2) the
concept of the IU model to deal with the incremental nature of human
dialogue. With the combination of the frameworks Pamini and inprotk
both concepts are considered in my dialogue system. This allows
autonomous HAI and the investigation of the effects of my AHM in
interaction.
In the third part, I evaluate the effects of my AHM on the interaction
(partner) in five Evaluation Cycles (ECs), consisting of three pilot- and
two HAI studies in a smart-home environment. In these cycles, I
further enhance my model, its implementation, and the experimental
design. Thereby, I investigate the effect of the AHM on the task performance and the side effects in interaction: the subjective ratings of the agent and the visual attention of interlocutors.
With my investigations, I show that in short interactions without a
change of discourse, the participants interacting with an agent that
uses my AHM are significantly less inattentive than participants in
the baseline (EC1). Furthermore, I show that the AHM can work fully
autonomously (EC2, EC4). Regarding the task performance, I demonstrate that participants interacting with an agent that uses my AHM
perform significantly better in some practical tasks than participants
in the baseline (EC3-EC5). This effect is, however, accompanied by
lower subjective ratings of the agent (EC2-EC4). The ratings show
that repetitions can be perceived as annoying (EC2) and users may
struggle with the differentiation of unfilled pauses from turn-ends in
more complex scenarios (EC2, EC3). However, the use of lengthening
may counteract this problem and enhance some subjective ratings
(EC4). The final model uses mutual gaze and task related features
to distinguish inattentiveness based on (1) missing engagement from
(2) difficulties in understanding. To deal with inattention based on
missing engagement, a cascade of lengthening, unfilled pauses, and
hesitation vowels are used. For difficulties in understanding, the model
uses repetitions with lengthening. This combination improves the task
performance without negative side effects on the interaction (EC5).