Studies on situated language comprehension (i.e., comprehension in rich visual contexts), have shown that the comprehender makes use of different information sources in order to establish visual reference and to visually anticipate entities in a scene while understanding language (reflecting expectations on what might be mentioned next). Semantics and world-knowledge (i.e., experiential, long-term knowledge) are among these sources. For instance, when listening to a sentence like *The girl will ride...*, the comprehender will likely anticipate an object that a girl could ride, e.g., a carrousel, rather than other objects, such as a motorbike (Kamide, Altmann, & Haywood, 2003). However, following the inspection of events (featuring agents acting upon objects or patients), comprehenders have so far shown a preference to visually anticipate the agents or objects that have been seen as part of those prior events (i.e., *recent-event preference* or the preference for event-based representations; Abashidze, Carminati, & Knoeferle, 2014; Knoeferle, Carminati, Abashidze, & Essig, 2011). This preference emerged even when other plausible objects or better stereotypically fitting agents were present. Although the preference for event-based information over other sources (e.g., plausibility or stereotypicality) seems to be strong and has been accommodated in accounts of situated language comprehension (Knoeferle & Crocker, 2006, 2007), its nature when comprehenders generate expectations is still unspecified. Crucially, the preference for recent events has not been generalized from action events to other types of information in the visual and linguistic contexts. <br />To further examine this issue, this thesis investigated the role of a particular type of information during situated language comprehension under the influence of prior events, namely, visual gender and action cues and knowledge about gender stereotypes. As many studies in the field of psycholinguistics have highlighted, gender (both a biological and a social feature of human beings) is relevant in language comprehension (e.g., grammatical gender can serve to track reference in discourse, and gender-stereotype knowledge can bias our interpretation of a sentence). However, little psycholinguistic research has examined the comprehension of gender information in a visual context. We argue that gender is worth exploring in a paradigm where prior event representations can be pitted against long-term knowledge. Not only that, inspired by experiments using mismatch designs, we wanted to see how the visual attention of the comprehender might be affected as a function of referential incongruencies (i.e., mismatches between visual events and linguistic information, e.g., Knoeferle, Urbach, & Kutas, 2014; Vissers, Kolk, Van de Meerendonk, & Chwilla, 2008; Wassenaar & Hagoort, 2007) and incongruences at the level of worldknowledge (i.e., gender stereotypes; e.g., Duffy & Keir, 2004; Kreiner, Sturt, & Garrod, 2008). By doing so, we could get insights into how both types of sources (event-based information and gender-stereotype knowledge from language) are used, i.e., whether one is more important than the other or if both are equally exploited in situated language comprehension. <br />We conducted three eye-tracking, visual-world experiments and one EEG experiment. In all of these experiments, participants saw events taking place prior to sentence comprehension, i.e., videos of (female or male) hands acting upon objects. In the eye-tracking experiments, following the videos, a visual scene appeared with the faces of two potential agents: one male and one female1. While the agent matching the gender features from prior events (i.e., the hands) was considered as the target agent, the other potential agent, whose gender was not cued in previous events, was the competitor agent. The visual scene in Experiment 3 further included the images of two objects; one was the target object (i.e., the object that appeared in prior events), while the other was a competitor object with opposite stereotypical valence. During the presentation of this scene, an OVS sentence was presented (e.g., translation from German: ‘The cakeNP1/obj bakesV soonADV SusannaNP2/subj’). We used the non-canonical OVS word order as opposed to SVO (more commonly used in prior research, e.g., Knoeferle, Carminati, et al., 2011) precisely to examine participants’ expectations towards the agent, who was mentioned at final position. We manipulated two factors. One factor was the match between prior visual events and language: there were action-verb(-phrase) mismatches in Experiments 1 and 3, and mismatches between the gender of the hands and the final subject (i.e., the proper name) in Experiments 2 and 4. The second manipulation, present in Experiments 1 to 3, was the match between the stereotypical valence of the described actions/events in the sentence and the target agent’s gender. In the eye-tracking experiments, we measured participants’ visual attention towards the agents’ faces during sentence comprehension. In the EEG experiment, we measured ERP responses time-locked to the final, proper name region (i.e., Susanna). Participants’ task was to verify via button press whether the sentence matched the events they just saw. <br />In line with prior research, our results support the idea that the preference for eventbased representations generalizes to another cue, i.e., gender features from the hands of an agent during prior events. Participants generally preferred to look at the target agent compared to the competitor. These results also suggest that the recent-event preference does not just rely on representations of full objects, agents and events, but also subtler (gender) features that serve to identify feature-matching targets during comprehension (i.e., faces of agents are inspected based on the gender features from hands seen in prior events). This preference is however modulated by mismatches in language, i.e., whenever the actions described or the gender implied by the final noun in the sentence were at odds with prior events, attention towards the target agent was reduced. In addition, the scene configuration of Experiment 3 gave rise to gender stereotypicality effects, which had not yet been found in prior studies using a similar design. Participants looked at the target agent (vs. the competitor) to a greater extent when the action described by the sentence stereotypically matched (vs. mismatched) them. As for the electrophysiological response towards mismatches between event-based gender cues and language, we found a biphasic ERP response, which suggests that this type of verification requires two semantically-induced stages of processing. This response had commonalities both with some effects found in strictly linguistic/discourse contexts but also with previously observed mismatch effects in picture-sentence verification studies (i.e., role relation and action mismatches; Knoeferle et al., 2014), which suggests that a similar (perhaps a single) processing mechanism might be involved in several visuolinguistic relations. <br />In sum, our results using gender and action cues from prior events and long-term knowledge call for a more refined consideration of the different aspects involved in (situated) language comprehension. On the one hand, existing accounts need to accommodate further reconciliations/verifications of visuolinguistic relations (e.g., roles, actions, gender features, etc.). When it comes to listeners generating expectations during comprehension while inspecting the visual world, we further suggest that a weighted system (i.e., a system indexing the strength of the expectation and how different information sources contribute to it; also suggested in Münster, 2016), applies for gender of information. Not only event-based representations, but also different discrepancies between these representations and language and, depending on the concurrent visual scene configuration, long-term knowledge (e.g., pertaining to gender stereotypes), can affect weighted expectations. Biosocial aspects such as gender may be of particular interest to answer some of the open questions in how situated language comprehension works, as these aspects can be found and manipulated at different levels of communication (e.g., the comprehender, the speaker, the linguistic content, etc.).