TY - THES
AB - This research asks how humans connect spatial language to physical
space. To investigate this question, the present dissertation focuses on
the task of verifying sentences containing a projective spatial preposition (e.g., *above, below*) against a depicted spatial relation (e.g., a circle
above a rectangle). Linguistically, the two components of a spatial relation are distinguished from each other: “The [located object (LO)] is
above the [reference object (RO)].” That is, a spatial preposition specifies the location of an LO with respect to an RO. Typically, semantics
do not allow to interchange RO and LO (although syntactically this is
not a problem). For instance, compare the sentence “The bike (LO) is
in front of the house (RO)” with “The house (LO) is behind the bike
(RO)” (cf. Talmy, 2000, p. 183)
For the processing of spatial relations, shifts of visual attention have
been identified as an important mechanism (Franconeri, Scimeca, Roth,
Helseth, & Kahn, 2012; Logan & Sadler, 1996; see Chapters 1 and 2).
While Logan (1995) and Logan and Sadler (1996) claimed that attention
should shift from the RO to the LO during the processing of spatial
relations, recent empirical evidence suggests that the shift of attention
might also take place in the same order as the sentence unfolds – from
the LO to the RO (Burigo & Knoeferle, 2015; Roth & Franconeri, 2012).
A computational cognitive model of spatial language verification
is the ‘Attentional Vector Sum’ (AVS) model proposed by Regier and
Carlson (2001). This model (implicitly) implements a shift of attention
from the RO to the LO (see Chapter 1). It accommodates empirical
data from a range of different spatial RO-LO configurations (Regier
& Carlson, 2001). To what extent does this good model performance
originate from the directionality of the implemented shift (from the RO
to the LO)? Considering the recent empirical evidence that attention
might move in the reversed direction (from the LO to the RO) – would
a model implementing such a reversed shift perform better or worse
on the empirical data? These are the main questions that motivated the
present thesis.
To answer these questions, I developed several variations of the AVS
model (taking into account the two important geometric properties
‘proximal orientation’ and ‘center-of-mass orientation’; Regier, 1996;
Regier & Carlson, 2001). In all these variations, the shift of attention
goes from the LO to the RO (instead of from the RO to the LO). This
is why they are called ‘reversed AVS’ (rAVS) models. In Chapter 3, I
assess the rAVS variations using empirical data (acceptability ratings for
spatial prepositions) from Hayward and Tarr (1995), Logan and Sadler
(1996), and Regier and Carlson (2001). More specifically, I fitted the
models to the empirical data (separately for each experiment and for
the whole data set from Regier & Carlson, 2001). That is, I minimized
the ‘normalized Root Mean Square Error’ (nRMSE) and thus obtained
a ‘goodness-of-fit’ (GOF) measure. Moreover, I evaluated the ability
of the models to generalize to unseen data (cf. Pitt & Myung, 2002) by
applying the ‘simple hold-out’ method (SHO; Schultheis, Singhaniya,
& Chaplot, 2013). The SHO is a cross-fitting method that accounts
for potential over-fitting of empirical data. Considering these model
benchmarks, one rAVS variation – the rAVSw-comb model – performs as
well as the AVS model on the tested empirical data. The rAVSw-comb
model implements a mechanism in which ‘relative distance’ (roughly:
absolute distance from LO to RO divided by the dimensions of the
RO) weights the influence of the two important geometric features
proximal orientation and center-of-mass orientation. Based on these
results, neither implementation of directionality of attention is able to
accommodate the empirical findings better than the other.
This is why I analyzed the AVS and rAVSw-comb models in terms
of their predictions (Chapter 4). The idea was to identify stimuli for
which the two contrasting shift-implementations (i.e., the two models)
predict different outcomes. Data collected with these stimuli could then
potentially tell apart the two models (e.g., if humans follow predictions
from one model but not from the other). I created two types of test
cases for which the two models seemed to generate somewhat different
outcomes: a relative distance test case and an asymmetrical ROs test
case.
In the relative distance test case, the critical manipulation is the height
of the rectangular ROs. The absolute placements of the LOs remain
equal in these stimuli. This test case is the first to investigate a potential
influence of relative distance on human spatial language acceptability
ratings. The predictions for the relative distance test case were that
across different RO heights, acceptability ratings should differ (despite equal absolute LO placements). This prediction was clear for the
rAVSw-comb model. However, due to the averaging vector sum mechanism in the AVS model, the prediction from the AVS model remained
unclear.
The second test case (asymmetrical ROs) challenges the role of the
vector sum in the AVS model. For this test case, I designed asymmetrical
ROs. LOs are placed either above the cavity of these ROs or above the
mass. (The RO-side that faces the LO is flat.) For these ROs, the center-of-mass does not coincide with the center-of-object (the center of the
bounding box of the RO). Based on intuitive reasoning, the AVS model
predicts different acceptability ratings for LOs placed (i) with equal
distance to the center-of-mass but (ii) either above the cavity or the
mass of the RO: the AVS model seems to predict higher ratings for
LOs placed above the mass compared to LOs above the cavity. The
rAVSw-comb model predicts no difference for this test case.
I systematically simulated the models on the created stimuli using
the ‘Parameter Space Partitioning’ method (PSP; Pitt, Kim, Navarro,
& Myung, 2006). This method enumerates all qualitatively different
data patterns a model is able to generate – based on evaluating the
whole parameter space of the model. Surprisingly, the PSP analysis
revealed that both models share some of their predictions (but the
models do not generate equal outcomes for all stimuli and parameter
settings). Empirical data collected with these stimuli still might help
to distinguish between the two models in terms of performance (e.g.,
based on different quantitative model fits)
This is why I conducted an empirical study that tested the model predictions for both developed test cases (relative distance and asymmetrical ROs). The empirical study was designed to be as close as possible
to the experimental setup reported in Regier and Carlson (2001). That
is, 34 participants read the German sentence “Der Punkt ist über dem
Objekt” (“The dot is above the object”) and afterwards had to rate its
acceptability given a depicted spatial relation (e.g., an image of a dot
and a rectangle) on a scale from 1 to 9. In addition to *über (above)*, I also
tested the German preposition *unter (below)*. In total, the study tested
448 RO-LO configurations. Moreover, I tracked the eye-movements of
participants during inspection of the depicted spatial relation. These
data are a measure of overt attention during spatial relation processing.
The empirical study could generalize effects on spatial language verification from English to German (‘grazing line’ effect and lower ratings
for *unter, below,* compared to *über, above*). Furthermore, the empirical
study revealed an effect of relative distance on spatial language acceptability ratings, although different than predicted by the rAVSw-comb
model. The empirical data from the rectangular ROs suggest that lower
relative distance weakens (i) the effect of proximal orientation and (ii)
– for high values of proximal orientation – weakens a reversed effect
of center-of-mass orientation. Neither the rAVSw-comb model nor the
AVS model can fully accommodate this finding. Future research should
more closely investigate the effect of relative distance.
For the asymmetrical ROs, analyses of the empirical data suggest that
people rely on the center-of-object instead of on the center-of-mass for
their acceptability ratings. This challenges earlier findings about the
importance of the center-of-mass orientation. However, given that in
earlier studies, the center-of-mass and the center-of-object most often
coincided, the data presented in this dissertation provide additional
information on how humans process geometry in the context of spatial
language verification.
In terms of eye movements, the empirical data provide evidence for
the horizontal component of the attentional focus as defined in the AVS
model. This focus is also an important point in the rAVSw-comb model.
The empirical results do not contradict the vertical component of the
hypothesized attentional focus. However, due to the design of the study,
it remains unclear whether the vertical fixation locations were caused
by the used preposition or by the vertical location of the LO. In addition,
people inspected the two types of asymmetrical ROs slightly differently.
For the more open asymmetrical shapes (L-shaped), fixations were
influenced by the asymmetrical distribution of mass. In contrast, for the
less open but still asymmetrical shapes (C-shaped), fixation patterns
could not be distinguished from fixation patterns to rectangular ROs.
Note that for all asymmetrical ROs, the center-of-object orientation
could predict the rating data better than the center-of-mass orientation
– despite distinct fixation patterns.
To further analyze the claim that people might use the center-of-object
instead of the center-of-mass for their ratings, I developed modifications for the two cognitive models. While the AVS and rAVSw-comb
models rely on the center-of-mass, the two new models ‘AVS bounding
box’ (AVS-BB) and ‘rAVS center-of-object’ (rAVS-CoO) consider the
center-of-object instead (the rest of the models remains unchanged). To
thoroughly analyze all four cognitive models, I applied several model
comparison techniques (Chapter 5). Based on the stimuli and data
from the empirical study, the goal of the model simulations was to
distinguish between models that implement a shift from the RO to
the LO (AVS, AVS-BB) and models that implement a shift from the
LO to the RO (rAVSw-comb , rAVS-CoO). Apart from fitting the models
to the data (per GOF and SHO), I analyzed them using the ‘Model
Flexibility Analysis’ (MFA, Veksler, Myers, & Gluck, 2015) and the
‘landscaping’ method (Navarro, Pitt, & Myung, 2004). The latter two
methods provide information on how flexible the models are. A highly
flexible model is able to generate a vast amount of distinct output. A
model with low flexibility generates only few distinct data patterns.
In comparing model performances, one should consider the model
flexibility (Roberts & Pashler, 2000). This is because a more flexible
model might even fit empirically implausible data well – due to its
high flexibility. This renders a close fit to empirical data a necessary
but not sufficient criteria for a “good” model. In addition to providing
a different perspective on model flexibility, landscaping measures to
what extent two models are mimicking each other (in which case it is
more difficult to distinguish between them).
Considering all model simulations, the two newly proposed models
rAVS-CoO and AVS-BB (accounting for the center-of-object instead of
for the center-of-mass) perform substantially better than their predecessors rAVSw-comb and AVS. In contrast to the center-of-mass models, the
two center-of-object models better fit the empirical data (GOF, SHO)
while they are less flexible (MFA, landscaping) and generate rating
patterns closer to the empirical patterns (PSP). This supports the hypothesis that people rely on the center-of-object orientation instead
of on the center-of-mass orientation. In terms of the main research
question, however, the model simulations do not favor any of the two
implemented directionalities of attention over the other. That is, based
on the existing empirical data and the cognitive models, both directionalities of attention are equally likely. The thesis closes with a model
extension that allows cognitive modelers to analyze the models more
fine-grained in the future. More specifically, extended models generate
full rating distributions instead of mean ratings. This makes it possible
to use all information available in the empirical data for future model
assessments.
Finally, Chapter 6 summarizes the results of this Ph.D. project. Following the seminal three-level framework proposed by Marr (1982), I
discuss the findings and relate them to other relevant research. I sketch
several promising possibilities to enhance the models in order to create
a more comprehensive model of spatial language processing. Such
a model would allow cognitive scientists to further investigate how
humans ground their spatial language in the visual world.
DA - 2019
DO - 10.4119/unibi/2935686
LA - eng
PY - 2019
TI - Modeling the Contribution of Visual Attention to Spatial Language Verification
UR - https://nbn-resolving.org/urn:nbn:de:0070-pub-29356864
Y2 - 2024-11-24T03:10:46
ER -