In order to understand and model the non-verbal communicative behavior of humans, qualitative techniques, such as Conversation Analysis, and quantitative techniques, such as 3D motion capturing, need to be combined. Although there has been some recent progress in annotation tools like ELAN or Anvil, there is still a lack of appropriate tool support that enables a concise simultaneous access to both types of data and that shows the relationship between them. Within this work, we present a pre- annotation tool that takes the results from off-the-shelf optical tracking systems, automatically fits an articulated skeleton model, and detects motion segments of individual joints. A sophisticated user interface easily allows the annotating person to find correlations between different joints, analyze the corresponding 3D pose in a reconstructed virtual environment, and to export combined qualitative and quantitative annotations to standard annotation tools. Using this technique we are able to examine complex setups with three persons in tight conversion or largely unconstrained engagement situations of humans and robots.