Robots interacting with humans need to understand actions and make use of language in social interactions. Research on infant development has shown that language helps the learner to structure visual observations of action. This acoustic information typically in the form of narration overlaps with action sequences and provides infants with a bottom-up guide to find structure within them. This concept has been introduced as acoustic packaging by Hirsh-Pasek and Golinkoff. We developed and integrated a prominence detection module in our acoustic packaging system to detect semantically relevant information linguistically
highlighted by the tutor. Evaluation results on speech data from adult-infant interactions show a significant agreement with human raters. Furthermore a first approach based on acoustic packages which uses the prominence detection results to generate acoustic feedback is presented.
Index Terms: prominence, multimodal action segmentation,
human robot interaction, feedback