This paper addresses the semantic coordination of speech and gesture, a major prerequisite when endowing virtual agents with convincing multimodal behavior. Previous research has focused on build- ing rule- or data-based models speci c for a particular language, culture or individual speaker, but without considering the underlying cognitive processes. We present a exible cognitive model in which both linguistic as well as cognitive constraints are considered in order to simulate natu- ral semantic coordination across speech and gesture. An implementation of this model is presented and rst simulation results, compatible with empirical data from the literature are reported.