Marko Mihajlovic
I am a final-year PhD student at ETH Zurich, specializing in developing digital models from everyday visual sensors such as cameras. During my PhD, I spent over a year at Meta (in Zurich and Pittsburgh), where I contributed to projects focused on immersive digital representations. I am passionate about advancing systems that not only create precise digital replicas but also enhance machines’ ability to perceive and understand our world.
Email  /  Google Scholar  /  Twitter  /  CV  /  LinkedIn  /  Github
SplatFormer is a data-driven 3D transformer for refining 3D Gaussian splats to improve quality of novel views from extreme camera viewpoints.
FreSh aligns the frequencies of an implicit neural representation with its target signal to speed up the convergence.
RISE-SDF reconstructs the geometry and material of glossy objects while achieving high-quality relighting.
SplatFields regularizes 3D gaussian splats for sparse 3D and 4D reconstruction.
Morphable diffusion enables consistent controllable novel view synthesis of humans from a single image.
Given a monocular video, 3DGS-Avatar learns a clothed human avatars with short training time and interactive rendering frame rate.
How to infer scene dynamics from sparse point trajectory observations? We show a simple yet effective solution using a spatiotemporal MLP with carefully designed regularizations. No need for scene-specific priors.
ResField layers incorporates time-dependent weights into MLPs to effectively represent complex temporal signals.
KeypointNeRF is a generalizable neural radiance field for virtual avatars. Given as input 2-3 images, KeypointNeRF generates volumetric radiance representation that can be rendered from novel views.
COAP is a novel neural implicit representation for articulated human bodies that provides an efficient mechanism for modeling self-contact and interactions with the environment.
Generalizable and controllable neural signed distance fields (SDFs) that represent clothed humans from monocular depth observations.
LEAP is a neural network architecture for representing volumetric animatable human bodies. It follows traditional human body modeling techniques and leverages a statistical human prior to generalize to unseen humans.
DeepSurfels is a novel 3D representation for geometry and appearance information that combines planar surface primitives with voxel grid representation for improved scalability and rendering quality.