Attentive speaker agents – artificial conversational agents that can attend to and adapt to listener feedback – need to attribute a mental ‘listener state’ to the user and keep track of the grounding status of their own utterances. We propose a joint model of listener state and information state, represented as a dynamic Bayesian network, that can capture the influences between dialogue context, user feedback, the mental listener state and the information state, providing an estimation of grounding.