EgoBody Dataset

Human Body Shape and Motion of Interacting People
from Head-Mounted Devices

1ETH Zürich     2Microsoft   
* Now at Meta Reality Labs Research
European Conference on Computer Vision (ECCV) 2022

Paper      Dataset      Code   

🔥 Submit to the EgoBody Challenge 🔥  

@ ⚡ ECCV 2022 Workshop  

(Challenge will be closed on 1st (extended to) 8th October 2022)

Participate in the EgoBody challenge here and submit an abstract to our ECCV 2022 workshop
Human Body, Hands, and Activities from Egocentric and Multi-view Cameras!

The first phase of the challenge will be closed on October 8th, 2022.

The prize award will be granted to the Top 3 teams on the leaderboard with a valid submission:
1st Place: 1500 CHF
2nd Place: 1000 CHF
3rd Place: 500 CHF
We faithfully thank ETH AI Center for the sponsorship and providing the prizes.

method overview

The task for the first phase of the EgoBody challenge is 3D human pose and shape estimation from an egocentric monocular RGB image. Participants are encouraged to submit a 2-4 page abstract to our ECCV workshop.

Dataset Overview


EgoBody is a large-scale dataset capturing ground-truth 3D human motions during social interactions in 3D scenes. Given two interacting subjects, we leverage a lightweight multi-camera rig to reconstruct their 3D shape and pose over time (1st and 2nd rows). One of the subjects (blue) wears a head-mounted device, synchronized with the rig, capturing egocentric multi-modal data like eye gaze tracking (red circles in the 3rd row) and RGB images (bottom). EgoBody dataset contains:

 • 125 sequences
 • 36 subjects
 • 15 indoor 3D scenes
 • 219731 synchronized multi-view third-person view RGBD frames from 3-5 Azure Kinects
 • 199111 egocentric view RGB frames from HoloLens2, synchronized with Kinect frames
 • Eye gaze, hand/head tracking, and depth from HoloLens2
 • SMPL-X/SMPL annotations for 3D body pose, shape and motion annotations for both the interactee and the camera wearer

Capture Setup

method overview

EgoBody collects sequences of subjects performing diverse social interactions in various indoor scenes. For each sequence, two subjects are involved in one or more predefined interactions. Multiple Azure Kinects capture the interactions from different views (A, B, C) with RGBD streams, and a synchronized HoloLens2 worn by one subject captures the first-person view image (D), together with depth, head, hand and eye gaze tracking streams.

Dataset Download

To download the EgoBody dataset, please sign the license and download the data here.
For detailed information of the data format, please check here.
Currently the EgoSet (egocentric RGB subset of EgoBody) is released. Other modalities (third-person view RGBD, 3D scene, eye gaze, etc.) will come soon.



EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices
Siwei Zhang, Qianli Ma, Yan Zhang, Zhiyin Qian, Taein Kwon, Marc Pollefeys, Federica Bogo, Siyu Tang

   title = {EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices},
   author = {Zhang, Siwei and Ma, Qianli and Zhang, Yan and Qian, Zhiyin and Kwon, Taein and Pollefeys, Marc and Bogo, Federica and Tang, Siyu},
   booktitle = {European conference on computer vision (ECCV)},
   month = oct,
   year = {2022}


For questions, please contact Siwei Zhang: