Human communication is complex, dynamic and implicit. People know when others want to interact with them. They know when they are addressed, whether they need to react, and to whom. This understanding is learnt early and refined throughout the whole life. Artificial agents, in contrast, do not grow up. They are not exposed to great amounts of high quality training in interaction as humans are. Nevertheless, if we want to interact with artificial agents in as we do with humans, we need them to understand our communication. They need to recognize the states we are in, the intentions we pursue, and the behaviours we display to achieve this. In this thesis, I investigate which human behaviours can be observed to infer the conversational state and intentions of humans in interactions with artificial agents in a smart environment. After a detailed review of literature on the principles of human interaction and the efforts to transfer these to artificial agents and smart environments, I investigate human conversational cues in interactions with different kinds of agents. With these investigations I show that (1) although addressing in unconstrained interactions of single users with devices and agents is diverse, the addressed entity can be recognized to a high degree from audio-visual cues, (2) a robot in a human-robot conversational group can utilize facial information of its interlocutors to decide whether it is addressed or not, and (3) the conversational group and role of a virtual agent can be recognized by observing the motion and facial expressions of the people in its vicinity. The insights from these investigations and the corresponding models allow an automatic interpretation of human conversational behaviour in interactions with artificial agents. This can be used to create agents which better understand and utilize human communication, to make interaction more natural and effective.