TY - THES AB - Up to now, the focus in gesture research has long been on the production of speech-accompanying gestures and on how speech-gesture utterances contribute to communication. An issue that has mostly been neglected is in how far listeners even perceive the gesture-part of a multimodal utterance. For instance, there has been a major focus on the lexico-semiotic connection between spontaneously coproduced gestures and speech in gesture research (e.g., de Ruiter, 2007; Kita & Özyürek, 2003; Krauss, Chen & Gottesman, 2000). Due to the rather precise timing between the prosodic peak in speech with the most prominent stroke of the gesture phrase in production, Schegloff (1984) and Krauss, Morrel-Samuels and Colasante (1991; also Rauscher, Krauss & Chen, 1996), among others, coined the phenomenon of lexical affiliation. By following Krauss et al. (1991), the first empirical study of this dissertation investigates the nature of the semiotic relation between speech and gestures, focusing on its applicability to temporal perception and comprehension. When speech and lip movements diverge too far from the original production synchrony, this can be highly irritating to the viewer, even when audio and video stem from the same original recording (e.g., Vatakis, Navarra, Soto-Faraco & Spence, 2008; Feyereisen, 2007) – there is only a small temporal window of audiovisual integration (AVI) within which viewer-listeners can internally align discrepancies between lip movements and the speech supposedly produced by these (e.g. McGurk & MacDonald, 1976). Several studies in the area of psychophysics (e.g., Nishida, 2006; Fujisaki & Nishida, 2005) found that there is also a time window for the perceptual alignment of nonspeech visual and auditory signals. These and further studies on the AVI of speech-lip asynchronies have inspired research on the perception of speech-gesture utterances. McNeill, Cassell, and McCullough (1994; Cassell, McNeill & McCullough, 1999), for instance, discovered that listeners take up information even from artificially combined speech and gestures. More recent studies researching the AVI of speech and gestures have employed event-related potential (ERP) monitoring as a methodological means to investigate the perception of multimodal utterances (e.g., Gullberg & Holmqvist, 1999; 2006; Özyürek, Willems, Kita & Hagoort, 2007; Habets, Kita, Shao, Özyürek & Hagoort, 2011). While the aforementioned studies from the fields of psychophysics and speech-only and speech-gesture research have contributed greatly to theories of how listeners perceive multimodal signals, there has been a lack of explorations of natural data and of dyadic situations. This dissertation investigates the perception of naturally produced speech-gesture utterances by having participants rate the naturalness of synchronous and asynchronous versions of speech-gesture utterances using different qualitative and quantitative methodologies such as an online rating study and a preference task. Drawing, for example, from speech-gesture production models based on Levelt's (1989) model of speech production (e.g., de Ruiter, 1998; 2007; Krauss et al., 2000; Kita & Özyürek, 2003) and founding on the results and analyses of the studies conducted for this dissertation, I finally propose a model draft of a possible transmission cycle between Growth Point (e.g., McNeill, 1985; 1992) and Shrink Point, the perceptual counterpart to the Growth Point. This model includes the temporal and semantic alignment of speech and different gesture types as well as their audiovisual and conceptual integration during perception. The perceptual studies conducted within the scope of this dissertation have revealed varying temporal ranges in which an asynchrony in speechgesture utterances is integrable by the listener, especially iconic gestures. DA - 2017 LA - eng PY - 2017 TI - The shrink point: audiovisual integration of speech-gesture synchrony UR - https://nbn-resolving.org/urn:nbn:de:0070-pub-29087622 Y2 - 2024-11-23T09:40:22 ER -