Siwei Zhang (张四维)

I am a PhD student at Computer Vision and Learning Group (VLG), ETH Zürich, supervised by Siyu Tang. Prior to this, I obtained my Master degree (2020) in Electrical Engineering and Information Technology, ETH Zürich, and Bachelor degree (2017) in Automation, Tsinghua University.

My research focuses on human motion modelling and human-scene interaction learning.

  Email  /  Google Scholar  /    Twitter  /    Github

I am actively looking for full-time positions and postdoc positions. Feel free to reach out to me!

profile photo


RoHM: Robust Human Motion Reconstruction via Diffusion
Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu, Alexander Winkler, Petr Kadlecek, Siyu Tang, Federica Bogo
CVPR, 2024 Oral Presentation

Conditioned on noisy and occluded input data, RoHM reconstructs complete, plausible motions in consistent global coordinates.

EgoGen: An Egocentric Synthetic Data Generator
Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys, Siyu Tang
CVPR, 2024 Oral Presentation

EgoGen is new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks.

Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views
Siwei Zhang, Qianli Ma, Yan Zhang, Sadegh Aliakbarian, Darren Cosker, Siyu Tang
ICCV, 2023 Oral Presentation

Generative human mesh recovery for images with body occlusion and truncations: scene-conditioned diffusion model + collision-guided sampling = accurate pose estimation on observed body parts and plausible generation of unobserved parts.

POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation During Surgical Activities
Rui Wang*, Sophokles Ktistakis*, Siwei Zhang, Mirko Meboldt, Quentin Lohmeyer
MICCAI, 2023 Oral presentation

POV-Surgery is a synthetic egocentric dataset focusing on hand pose estimation with different surgical gloves and orthopedic surgical instruments, featuring RGB-D videos with annotations for activieis, 3D/2D hand-object pose, and 2D hand-object segmentation masks.

EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices
Siwei Zhang, Qianli Ma, Yan Zhang, Zhiyin Qian, Taein Kwon, Marc Pollefeys, Federica Bogo, Siyu Tang
ECCV, 2022

A large-scale dataset of accurate 3D body shape, pose and motion of humans interacting in 3D scenes, with multi-modal streams from third-person and egocentric views, captured by Azure Kinects and a HoloLens2.

SAGA: Stochastic Whole-Body Grasping with Contact
Yan Wu*, Jiahao Wang*, Yan Zhang, Siwei Zhang, Otmar Hilliges, Fisher Yu, Siyu Tang
(* denotes equal contribution)
ECCV, 2022

Starting from an arbitrary initial pose, SAGA generates diverse and natural whole-body human motions to approach and grasp a target object in 3D space.

Learning Motion Priors for 4D Human Body Capture in 3D Scenes
Siwei Zhang, Yan Zhang, Federica Bogo, Marc Pollefeys, Siyu Tang
ICCV, 2021 Oral Presentation

LEMO learns motion priors from a larger scale mocap dataset and proposes a multi-stage optimization pipeline to enable 3D motion reconstruction in complex 3D scenes.

PLACE: Proximity Learning of Articulation and Contact in 3D Environments
Siwei Zhang, Yan Zhang, Qianli Ma, Michael J. Black, Siyu Tang
3DV, 2020

An explicit representation for 3D person-scene contact relations that enables automated synthesis of realistic humans posed naturally in a given scene.

Facial Emotion Recognition with Noisy Multi-task Annotations
Siwei Zhang, Zhiwu Huang, Danda Pani Paudel, Luc Van Gool
WACV, 2021

To reduce human labelling effort on multi-task labels, we introduce a new problem of facial emotion recognition with noisy multi-task annotations.

Neural Architecture Search as Sparse Supernet
Yan Wu*, Aoming Liu*, Zhiwu Huang, Siwei Zhang, Luc Van Gool
(* denotes equal contribution)
AAAI, 2021

We model the NAS problem as a sparse supernet using a new continuous architecture representation with a mixture of sparsity constraints.

One-shot Face Reenactment
Yunxuan Zhang, Siwei Zhang, Yue He, Cheng Li, Chen Change Loy, Ziwei Liu
BMVC, 2019 Spotlight Presentation

We propose a novel one-shot face reenactment learning system, that is able to disentangle and compose appearance and shape information for effective modeling.

Academic Services


Template adapted from Jon Barron's and Qianli Ma's websites.