Embodied conversational agents are required to be able to
express themselves convincingly and autonomously. Based
on an empirial study on spatial descriptions of landmarks
in direction-giving, we present a model that allows virtual
agents to automatically generate, i.e., select the content and derive the form of coordinated language and iconic gestures. Our model simulates the interplay between these two modes of expressiveness on two levels. First, two kinds of knowledge representation (propositional and imagistic) are utilized to capture the modality-specific contents and processes of content planning. Second, specific planners are integrated to carry out the formulation of concrete verbal and gestural behavior. A probabilistic approach to gesture formulation is presented that incorporates multiple contextual factors as
well as idiosyncratic patterns in the mapping of visuo-spatial referent properties onto gesture morphology. Results from a prototype implementation are described.