In this paper we describe ongoing research that aims at the development of a generic demonstration platform for virtual prototype modeling by utilizingmultimodal – speech and gesture – interactions in Virtual Reality. Particularly, we concentrate on two aspects. First, a knowledge-based approach for assembling CAD-based parts in VR is introduced. This includes a system to generate meta-information from geometric models as well as accompanying task-level algorithms for virtual assembly. Second, a framework for modeling multimodal interaction using gesture and speech is presented that facilitates its generic adaptation to scene-graph-based applications. The chosen decomposition of the required core modules is exemplified by an example of a typical object rotation interaction.