Monocular Dynamic 3D Reconstruction
When the input is only ordinary RGB video—no depth sensor, no rig—can we recover dynamic 3D scene geometry well enough to compete with depth-sensor-supervised methods?
Monocular dynamic 3D reconstruction takes a single moving camera observing a deforming scene and tries to recover a complete 4D representation—geometry, appearance, motion—over the captured time window. The problem is fundamentally under-constrained at any one instant, and progress depends on how well the chosen scene representation and supervision signals work together.
Yiqing Liang's PhD drove this arc of our work. Starting from semantic attention flow fields built atop a dynamic NeRF at ICCV 2023, we moved to a forward-warping Gaussian deformation formulation (GauFRe, with Meta colleagues) for real-time rendering, then to a TMLR benchmark (MonoDyGauBench) that puts the recent flood of monocular dynamic Gaussian methods on a like-for-like footing. Our latest paper (Zero-MSF, with NVIDIA) abandons per-scene optimisation entirely and trains a feed-forward predictor for scene flow that generalises zero-shot to in-the-wild video.
Authors
Abhishek Badki · Orazio Gallo · Leonidas J. Guibas · Adam Harley · Numair Khan · Eliot Laidlaw · Douglas Lanman · Yiqing Liang · Runfeng Li · Zhengqin Li · Alexander Meyerowitz · Thu Nguyen-Phuoc · Mikhail Okunev · Srinath Sridhar · Hang Su · Mikaela Angelina Uy · Lei Xiao