Welcome to
Computer Vision and Learning Group.

...
...
...
...
...

Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

We study computational models that enable machines to perceive and analyze human activities from visual input. We leverage machine learning and optimization techniques to build statistical models of humans and their behaviors. Our goal is to advance algorithmic foundations of scalable and reliable human digitalization, enabling a broad class of real-world applications. Our group is part of the Institute for Visual Computing (IVC) at the Department of Computer Science of ETH Zurich.

Featured Projects

In-depth look at our work.

NaP-Control: Navigating Diffusion Prior for Versatile and Fast Character Control

ConferenceEuropean Conference on Computer Vision (ECCV 2026)

Authors:Chia-Wen Chen, Yan WuKorrawe KarunratanakulSiyu Tang

NaP-Control uses reinforcement learning to navigate the latent noise of a task-agnostic diffusion policy prior for fast, robust, and versatile physics-based character control.

Multi4D: High-Fidelity Dynamic Gaussian Splatting via Multi-Level Competitive Allocation

ConferenceEuropean Conference on Computer Vision (ECCV 2026)

Authors:Rui WangQuentin LohmeyerSiyu TangMirko Meboldt

Multi4D enables high-quality, efficient dynamic scene reconstruction via competitive multi-level specialization, and compact, high-accuracy 4D segmentation with fast inference.

MATCH: Feed-forward Gaussian Registration for Head Avatar Creation and Editing

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2026)

Authors:Malte PrinzlerPaulo GotardoSiyu TangTimo Bolkart

Given calibrated multi-view images of human heads, MATCH infers static Gaussian splat textures in dense semantic correspondence.

BulletTime: Decoupled Control of Time and Camera Pose for Video Generation

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2026)

Authors:Yiming WangQihang ZhangShengqu CaiTong WuJan AckermannZhengfei KuangYang ZhengFrano RajičSiyu TangGordon Wetzstein

Time- and camera-controlled 4D video generation that enables decoupled control over world time and camera pose from a single input video.

GGPT: Geometry Grounded Point Transformer

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2026)

Authors:Yutong ChenYiming WangXucong ZhangSergey ProkudinSiyu Tang

GGPT can use reliable geometric guidance to augment various feed-forward method for 3D reconstruction.

Masked Modeling for Human Motion Recovery Under Occlusions

ConferenceInternational Conference on 3D Vision (3DV 2026)

Authors:Zhiyin QianSiwei ZhangBharat Lal BhatnagarFederica BogoSiyu Tang

Given a monocular video captured from a static camera, MoRo robustly reconstructs accurate and physically plausible human motion, even under challenging occlusion scenarios.

Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction

ConferenceSIGGRAPH Asia 2025 Conference Track

Authors:Yiming WangShaofei WangMarko MihajlovicSiyu Tang

Neural Texture Splatting is an expressive extension of 3D Gaussian Splatting that introduces a local neural RGBA field for each primitive.

Learning Efficient Fuse-and-Refine for Feed-Forward 3D Gaussian Splatting

ConferenceNeurIPS 2025

Authors:Yiming WangLucy ChaiXuan LuoMichael NiemeyerManuel LagunasStephen LombardiSiyu TangTiancheng Sun

SplatVoxel is a hybrid Splat-Voxel representation that fuses and refines Gaussian Splatting, improving static scene reconstruction and enabling history-aware streaming reconstruction in a zero-shot manner.

Latest News

Here’s what we've been up to recently.