In naturally occurring speech and gesture, meaning occurs or- ganized and distributed across the modalities in different ways. The underlying cognitive processes are largely unexplored. We propose a model based on activation spreading within dy- namically shaped multimodal memories, in which coordina- tion arises from the interplay of visuo-spatial and linguistically shaped representations under given communicative and cogni- tive resources. An implementation of this model is presented and first simulation results are reported.