TY - THES AB - This research asks how humans connect spatial language to physical space. To investigate this question, the present dissertation focuses on the task of verifying sentences containing a projective spatial preposition (e.g., *above, below*) against a depicted spatial relation (e.g., a circle above a rectangle). Linguistically, the two components of a spatial relation are distinguished from each other: “The [located object (LO)] is above the [reference object (RO)].” That is, a spatial preposition specifies the location of an LO with respect to an RO. Typically, semantics do not allow to interchange RO and LO (although syntactically this is not a problem). For instance, compare the sentence “The bike (LO) is in front of the house (RO)” with “The house (LO) is behind the bike (RO)” (cf. Talmy, 2000, p. 183)

For the processing of spatial relations, shifts of visual attention have been identified as an important mechanism (Franconeri, Scimeca, Roth, Helseth, & Kahn, 2012; Logan & Sadler, 1996; see Chapters 1 and 2). While Logan (1995) and Logan and Sadler (1996) claimed that attention should shift from the RO to the LO during the processing of spatial relations, recent empirical evidence suggests that the shift of attention might also take place in the same order as the sentence unfolds – from the LO to the RO (Burigo & Knoeferle, 2015; Roth & Franconeri, 2012).

A computational cognitive model of spatial language verification is the ‘Attentional Vector Sum’ (AVS) model proposed by Regier and Carlson (2001). This model (implicitly) implements a shift of attention from the RO to the LO (see Chapter 1). It accommodates empirical data from a range of different spatial RO-LO configurations (Regier & Carlson, 2001). To what extent does this good model performance originate from the directionality of the implemented shift (from the RO to the LO)? Considering the recent empirical evidence that attention might move in the reversed direction (from the LO to the RO) – would a model implementing such a reversed shift perform better or worse on the empirical data? These are the main questions that motivated the present thesis.

To answer these questions, I developed several variations of the AVS model (taking into account the two important geometric properties ‘proximal orientation’ and ‘center-of-mass orientation’; Regier, 1996; Regier & Carlson, 2001). In all these variations, the shift of attention goes from the LO to the RO (instead of from the RO to the LO). This is why they are called ‘reversed AVS’ (rAVS) models. In Chapter 3, I assess the rAVS variations using empirical data (acceptability ratings for spatial prepositions) from Hayward and Tarr (1995), Logan and Sadler (1996), and Regier and Carlson (2001). More specifically, I fitted the models to the empirical data (separately for each experiment and for the whole data set from Regier & Carlson, 2001). That is, I minimized the ‘normalized Root Mean Square Error’ (nRMSE) and thus obtained a ‘goodness-of-fit’ (GOF) measure. Moreover, I evaluated the ability of the models to generalize to unseen data (cf. Pitt & Myung, 2002) by applying the ‘simple hold-out’ method (SHO; Schultheis, Singhaniya, & Chaplot, 2013). The SHO is a cross-fitting method that accounts for potential over-fitting of empirical data. Considering these model benchmarks, one rAVS variation – the rAVSw-comb model – performs as well as the AVS model on the tested empirical data. The rAVSw-comb model implements a mechanism in which ‘relative distance’ (roughly: absolute distance from LO to RO divided by the dimensions of the RO) weights the influence of the two important geometric features proximal orientation and center-of-mass orientation. Based on these results, neither implementation of directionality of attention is able to accommodate the empirical findings better than the other.

This is why I analyzed the AVS and rAVSw-comb models in terms of their predictions (Chapter 4). The idea was to identify stimuli for which the two contrasting shift-implementations (i.e., the two models) predict different outcomes. Data collected with these stimuli could then potentially tell apart the two models (e.g., if humans follow predictions from one model but not from the other). I created two types of test cases for which the two models seemed to generate somewhat different outcomes: a relative distance test case and an asymmetrical ROs test case.

In the relative distance test case, the critical manipulation is the height of the rectangular ROs. The absolute placements of the LOs remain equal in these stimuli. This test case is the first to investigate a potential influence of relative distance on human spatial language acceptability ratings. The predictions for the relative distance test case were that across different RO heights, acceptability ratings should differ (despite equal absolute LO placements). This prediction was clear for the rAVSw-comb model. However, due to the averaging vector sum mechanism in the AVS model, the prediction from the AVS model remained unclear.

The second test case (asymmetrical ROs) challenges the role of the vector sum in the AVS model. For this test case, I designed asymmetrical ROs. LOs are placed either above the cavity of these ROs or above the mass. (The RO-side that faces the LO is flat.) For these ROs, the center-of-mass does not coincide with the center-of-object (the center of the bounding box of the RO). Based on intuitive reasoning, the AVS model predicts different acceptability ratings for LOs placed (i) with equal distance to the center-of-mass but (ii) either above the cavity or the mass of the RO: the AVS model seems to predict higher ratings for LOs placed above the mass compared to LOs above the cavity. The rAVSw-comb model predicts no difference for this test case.

I systematically simulated the models on the created stimuli using the ‘Parameter Space Partitioning’ method (PSP; Pitt, Kim, Navarro, & Myung, 2006). This method enumerates all qualitatively different data patterns a model is able to generate – based on evaluating the whole parameter space of the model. Surprisingly, the PSP analysis revealed that both models share some of their predictions (but the models do not generate equal outcomes for all stimuli and parameter settings). Empirical data collected with these stimuli still might help to distinguish between the two models in terms of performance (e.g., based on different quantitative model fits)

This is why I conducted an empirical study that tested the model predictions for both developed test cases (relative distance and asymmetrical ROs). The empirical study was designed to be as close as possible to the experimental setup reported in Regier and Carlson (2001). That is, 34 participants read the German sentence “Der Punkt ist über dem Objekt” (“The dot is above the object”) and afterwards had to rate its acceptability given a depicted spatial relation (e.g., an image of a dot and a rectangle) on a scale from 1 to 9. In addition to *über (above)*, I also tested the German preposition *unter (below)*. In total, the study tested 448 RO-LO configurations. Moreover, I tracked the eye-movements of participants during inspection of the depicted spatial relation. These data are a measure of overt attention during spatial relation processing.

The empirical study could generalize effects on spatial language verification from English to German (‘grazing line’ effect and lower ratings for *unter, below,* compared to *über, above*). Furthermore, the empirical study revealed an effect of relative distance on spatial language acceptability ratings, although different than predicted by the rAVSw-comb model. The empirical data from the rectangular ROs suggest that lower relative distance weakens (i) the effect of proximal orientation and (ii) – for high values of proximal orientation – weakens a reversed effect of center-of-mass orientation. Neither the rAVSw-comb model nor the AVS model can fully accommodate this finding. Future research should more closely investigate the effect of relative distance.

For the asymmetrical ROs, analyses of the empirical data suggest that people rely on the center-of-object instead of on the center-of-mass for their acceptability ratings. This challenges earlier findings about the importance of the center-of-mass orientation. However, given that in earlier studies, the center-of-mass and the center-of-object most often coincided, the data presented in this dissertation provide additional information on how humans process geometry in the context of spatial language verification.

In terms of eye movements, the empirical data provide evidence for the horizontal component of the attentional focus as defined in the AVS model. This focus is also an important point in the rAVSw-comb model. The empirical results do not contradict the vertical component of the hypothesized attentional focus. However, due to the design of the study, it remains unclear whether the vertical fixation locations were caused by the used preposition or by the vertical location of the LO. In addition, people inspected the two types of asymmetrical ROs slightly differently. For the more open asymmetrical shapes (L-shaped), fixations were influenced by the asymmetrical distribution of mass. In contrast, for the less open but still asymmetrical shapes (C-shaped), fixation patterns could not be distinguished from fixation patterns to rectangular ROs. Note that for all asymmetrical ROs, the center-of-object orientation could predict the rating data better than the center-of-mass orientation – despite distinct fixation patterns.

To further analyze the claim that people might use the center-of-object instead of the center-of-mass for their ratings, I developed modifications for the two cognitive models. While the AVS and rAVSw-comb models rely on the center-of-mass, the two new models ‘AVS bounding box’ (AVS-BB) and ‘rAVS center-of-object’ (rAVS-CoO) consider the center-of-object instead (the rest of the models remains unchanged). To thoroughly analyze all four cognitive models, I applied several model comparison techniques (Chapter 5). Based on the stimuli and data from the empirical study, the goal of the model simulations was to distinguish between models that implement a shift from the RO to the LO (AVS, AVS-BB) and models that implement a shift from the LO to the RO (rAVSw-comb , rAVS-CoO). Apart from fitting the models to the data (per GOF and SHO), I analyzed them using the ‘Model Flexibility Analysis’ (MFA, Veksler, Myers, & Gluck, 2015) and the ‘landscaping’ method (Navarro, Pitt, & Myung, 2004). The latter two methods provide information on how flexible the models are. A highly flexible model is able to generate a vast amount of distinct output. A model with low flexibility generates only few distinct data patterns. In comparing model performances, one should consider the model flexibility (Roberts & Pashler, 2000). This is because a more flexible model might even fit empirically implausible data well – due to its high flexibility. This renders a close fit to empirical data a necessary but not sufficient criteria for a “good” model. In addition to providing a different perspective on model flexibility, landscaping measures to what extent two models are mimicking each other (in which case it is more difficult to distinguish between them).

Considering all model simulations, the two newly proposed models rAVS-CoO and AVS-BB (accounting for the center-of-object instead of for the center-of-mass) perform substantially better than their predecessors rAVSw-comb and AVS. In contrast to the center-of-mass models, the two center-of-object models better fit the empirical data (GOF, SHO) while they are less flexible (MFA, landscaping) and generate rating patterns closer to the empirical patterns (PSP). This supports the hypothesis that people rely on the center-of-object orientation instead of on the center-of-mass orientation. In terms of the main research question, however, the model simulations do not favor any of the two implemented directionalities of attention over the other. That is, based on the existing empirical data and the cognitive models, both directionalities of attention are equally likely. The thesis closes with a model extension that allows cognitive modelers to analyze the models more fine-grained in the future. More specifically, extended models generate full rating distributions instead of mean ratings. This makes it possible to use all information available in the empirical data for future model assessments.

Finally, Chapter 6 summarizes the results of this Ph.D. project. Following the seminal three-level framework proposed by Marr (1982), I discuss the findings and relate them to other relevant research. I sketch several promising possibilities to enhance the models in order to create a more comprehensive model of spatial language processing. Such a model would allow cognitive scientists to further investigate how humans ground their spatial language in the visual world. DA - 2019 DO - 10.4119/unibi/2935686 LA - eng PY - 2019 TI - Modeling the Contribution of Visual Attention to Spatial Language Verification UR - https://nbn-resolving.org/urn:nbn:de:0070-pub-29356864 Y2 - 2024-11-24T03:10:46 ER -