I’m a researcher at Apple’s Research team led by Vladlen Koltun. Before joining Apple, I completed my PhD in Computer Science at ETH Zurich with Siyu Tang and Marc Pollefeys and spent over a year at Meta in Zurich and Pittsburgh, working on immersive digital representations. My research focuses on developing systems that reconstruct the world’s structure and dynamics from visual inputs to enable precise digital replicas, with the broader goal of enhancing machines’ ability to perceive and interact with their surroundings using minimal inputs.
ICCV (Oral), 2025
MVTracker is the first data-driven multi-view 3D point tracker, designed to track arbitrary points in dynamic scenes using multiple camera views. Unlike monocular trackers, which struggle with depth ambiguities and occlusion, our feed-forward model directly predicts 3D correspondences using a practical number of cameras, enabling robust and accurate online tracking.
ICCV (Highlight), 2025
VolumetricSMPL is a lightweight, plug-and-play extension for SMPL(-X) models that adds volumetric functionality via Signed Distance Fields (SDFs). With minimal integration—just a single line of code—users gain access to fast and differentiable SDF queries, collision detection, and self-intersection resolution.
SIGGRAPH, 2025
A spline-based trajectory representation that enables efficient analytical derivation of velocities, preserving spatial coherence and accelerations while mitigating temporal fluctuations. Our method demonstrates superior performance in temporal interpolation for fitting continuous fields with sparse inputs.
ICLR (spotlight), 2025
SplatFormer is a data-driven 3D transformer for refining 3D Gaussian splats to improve quality of novel views from extreme camera viewpoints.
ICLR, 2025
FreSh aligns the frequencies of an implicit neural representation with its target signal to speed up the convergence.
3DV, 2025
RISE-SDF reconstructs the geometry and material of glossy objects while achieving high-quality relighting.
ECCV, 2024
SplatFields regularizes 3D gaussian splats for sparse 3D and 4D reconstruction.
CVPR, 2024
Morphable diffusion enables consistent controllable novel view synthesis of humans from a single image.
CVPR, 2024
Given a monocular video, 3DGS-Avatar learns a clothed human avatars with short training time and interactive rendering frame rate.
CVPR, 2024
How to infer scene dynamics from sparse point trajectory observations? We show a simple yet effective solution using a spatiotemporal MLP with carefully designed regularizations. No need for scene-specific priors.
ICLR (spotlight), 2024
ResField layers incorporates time-dependent weights into MLPs to effectively represent complex temporal signals.
ECCV, 2022
KeypointNeRF is a generalizable neural radiance field for virtual avatars. Given as input 2-3 images, KeypointNeRF generates volumetric radiance representation that can be rendered from novel views.
CVPR, 2022
COAP is a novel neural implicit representation for articulated human bodies that provides an efficient mechanism for modeling self-contact and interactions with the environment.
NeurIPS, 2021
Generalizable and controllable neural signed distance fields (SDFs) that represent clothed humans from monocular depth observations.
CVPR, 2021
LEAP is a neural network architecture for representing volumetric animatable human bodies. It follows traditional human body modeling techniques and leverages a statistical human prior to generalize to unseen humans.
CVPR, 2021
DeepSurfels is a novel 3D representation for geometry and appearance information that combines planar surface primitives with voxel grid representation for improved scalability and rendering quality.