The way humans perceive and attend to visual scenes differs profoundly between individuals. This is most compellingly demonstrated for context-sensitivity, the relative attentional focus on focal objects and background elements of a scene, in cross-cultural comparisons. Differences in context-sensitivity have been reported in verbal accounts (e.g. picture descriptions) and in visual attention (e.g., eye-tracking paradigms). The present study investigates (1) if the way parents verbally guide the attention of their children in visual scenes is associated with differences in children’s context-sensitivity and (2) if verbal descriptions of scenes are related to early visual attention (i.e., gaze behavior) in 5-year-old children and their parents. Importantly, the way parents verbally described visual scenes to their children was related to children’s context-sensitivity, when describing these scenes themselves. This is, we found a correlation in the number of references made to the object versus the background as well as the number of relations made between different elements of a scene. Furthermore, verbal descriptions were closely related to visual attention in adults, but not in children. These findings support our hypotheses that context-sensitivity is socialized via a verbal route and that visual attention processes align with acquired narrative structures only later in development, after the preschool years.