The thesis here present is about the perception of the human body and the environment with means of computer vision and the analysis of these information for applications in the field of human robot interaction. The discussion will mostly be about real-world scenarios involving the observation of real humans; that means we will have to deal with an ever-changing and dynamic environment and possibly large variations in the appearance of an object to be observed. This poses a huge challenge to automated vision techniques. Additional constraints can ease the problem, but also make the resulting system less flexible. The presented work combines various techniques from computer vision and optimization theory. The scenarios that are addressed are wide spread: worker safety in an industry environment, interacting with a mobile robot and even understanding the relevance of gestures for learning in children. The common ground for all these scenarios is the fact that methods from computer vision are applied to enable or to understand an interaction between humans among themselves and humans and machines.