VLG | Computer Vision and Learning Group

Authors:Miao Liu, Siyu Tang, Yin Li, and James M. Rehg

Abstract

We address the challenging task of anticipating human-object interaction in first person videos. Most existing methods either ignore how the camera wearer interacts with objects, or simply considers body motion as a separate modality. In contrast, we observe that the intentional hand movement reveals critical information about the future activity. Motivated by this observation, we adopt intentional hand movement as a feature representation, and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action. Specifically, we consider the future hand motion as the motor attention, and model this attention using probabilistic variables in our deep model. The predicted motor attention is further used to select the discriminative spatial-temporal visual features for predicting actions and interaction hotspots. We present extensive experiments demonstrating the benefit of the proposed joint model. Importantly, our model produces new state-of-the-art results for action anticipation on both EGTEA Gaze+ and the EPIC-Kitchens datasets.

Video

Authors:

Prof. Dr. Siyu Tang
Assistant Professor of Computer Science, CNB G 104

Links:

Project PDF Source BibTeX

Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Egocentric Activity

Conference: European Conference on Computer Vision (ECCV) 2020 oral presentation

Abstract

Video

Authors:

Links: