In this paper we combine motion captured data with linguistic notions (preliminary study) in a game-like tutoring system (study 1), in order to help elementary school students to better differentiate literal from metaphorical uses of motion verbs, based on embodied information. In addition to the thematic goal, we intend to improve young students' attention and spatiotemporal memory, by presenting sensorimotor data experimentally collected from thirty two participants in our motion capturing labs. Furthermore, we examine the accomplishment of tutor's goals and compare them to curriculum's approach (study 2). Sixty nine elementary school students were randomly divided in two experimental groups (game-like and traditional) and one control group, which did not undergo an intervention. All groups were tested in pre and post-tests. Even though the diagnostic pretests present a uniform picture, two way analysis of variance suggests that the experimental groups showed progress in post-tests and, more specifically, game-like group showed less wrong answers in the linguistics task and higher learning achievements compared to the other two groups. Furthermore, in the game-like condition the participants needed gradually shorter period of time to identify the avatar's actions. This finding was considered as a first indication of attentional and spatiotemporal memory's improvement, while the tutor's assistance features cultivated students' metacognitive perception.