In this paper the description of 3D indoor scenes in natural language is studied from the point of view of intrinsic and relative location of the objects. An approach has been developed for this purpose which uses a XBox 360 Kinect in combination with ROS and PCL to obtain 3D-data from the scene. Object features are computed on these 3D-data, which are used to generate a SVM-model which classifies the different objects in the scene. After detecting the objects in the scene, their orientation is obtained and qualitative spatial relations between the objects are computed to generate a natural-language description of the scene.