Detecting hand-object interactions is a challenging
problem with many applications in the human-computer interaction
domain. We present a real-time method that automatically
detects hand-object interactions in RGBD sensor
data and tracks the object’s rigid pose over time. The
detection is performed using a fully convolutional neural
network, which is purposefully trained to discern the relationship
between hands and objects and which predicts
pixel-wise class probabilities. This output is used in a probabilistic
pixel labeling strategy that explicitly accounts for
the uncertainty of the prediction. Based on the labeling of
object pixels, the object is tracked over time using modelbased
registration. We evaluate the accuracy and generalizability
of our approach and make our annotated RGBD
dataset as well as our trained models publicly available.