The selection and processing of a spatial frame of reference (FOR) in interpreting verbal scene descriptions is of great interest to psycholinguistics. In this study, we focus on the choice between the relative and the intrinsic FOR, addressing two questions: (a) does the presence or absence of a background in the scene influence the selection of a FOR, and (b) what is the effect of a previously selected FOR on the subsequent processing of a different FOR. Our results show that if a scene includes a realistic background, this will make the selection of the relative FOR more likely. We attribute this effect to the facilitation of mental simulation, which enhances the relation between the viewer and the objects. With respect to the response accuracy, we found both a higher (with the same FOR) and a lower accuracy (with a different FOR), while for the response latencies, we only found a delay effect with a different FOR.