Developing a system which generates a 3D representation of a whole scene is a difficult task. Several new technologies of 3D time-of-flight (ToF) imaging have been developed in recent years, which overcome various limitations of other 3D imaging systems, such as laser/radar/sonar scanners, structured light and stereo rigs. However, only limited work got published upon computer vision applications based on such ToF sensors. We present in this paper a new complete system for 3D modeling from a sequence of range images acquired during an arbitrary flight of a 3D ToF sensor. First, comprehensive preprocessing steps are performed to improve the quality of range images. An initial estimate of the transformation between two 3D point clouds, which are computed from two consecutive range images respectively, is achieved through feature extraction and tracking based on three kinds of images delivered by the 3D sensor. During the initial estimation, a RANSAC sampling algorithm is implemented to filter out outlier correspondences. At last the transformation is further optimized through registering the two 3D point clouds using a robust variation of the Iterative Closest Point (ICP) algorithm, the so-called Picky ICP. Extensive experimental results are provided in the paper and show the efficiency and robustness of the proposed system.