James Tompkin

Associate Professor

Visual Computing

 BlueSky @brownvc.bsky.social
 Github @brownvc

Brown student researcher?
Group Onboarding Process

Contact


 BlueSky @jamestompkin.bsky.social
Google Scholar

Office hours: Weds 1300 EST
Book appointment

Brown folks: Save an email,
use GCal 'Find a Time'
and include an agenda. Instructions

Center for Information Technology
Room 547
115 Waterman Street
Providence, RI, 02912


Acknowledgements

My intrepid collaborators and co-authors.

Funding:

  • US NSF, DARPA, NASA
  • UK EPSRC, BBC
  • Industry Activision, Adobe, Amazon, Cognex, Google, Intel, Meta, Snap, AI Foundation

The open source Web com­munity: HTML5 Boiler­plate, Ryan Johnston, Joshua N. Hibbert, Practical­Typo­graphy.com, EB Gara­mond.

Hosted on GitHub Pages using Jekyll — basic theme by orderedlist.

James Tompkin

Associate Professor

Visual Computing

 BlueSky @brownvc.bsky.social
 Github @brownvc

Brown student researcher?
Group Onboarding Process

Contact


 BlueSky @jamestompkin.bsky.social
Google Scholar

Office hours: Weds 1300 EST
Book appointment

Brown folks: Save an email,
use GCal 'Find a Time'
and include an agenda. Instructions

Center for Information Technology
Room 547
115 Waterman Street
Providence, RI, 02912


Acknowledgements

My intrepid collaborators and co-authors.

Funding:

  • US NSF, DARPA, NASA
  • UK EPSRC, BBC
  • Industry Activision, Adobe, Amazon, Cognex, Google, Intel, Meta, Snap, AI Foundation

The open source Web com­munity: HTML5 Boiler­plate, Ryan Johnston, Joshua N. Hibbert, Practical­Typo­graphy.com, EB Gara­mond.

Hosted on GitHub Pages using Jekyll — basic theme by orderedlist.


← Back to homepage

Monocular Dynamic 3D Reconstruction

When the input is only ordinary RGB video — no depth sensor, no rig — can we recover dynamic 3D scene geometry well enough to compete with depth-sensor-supervised methods?

Monocular dynamic 3D reconstruction takes a single moving camera observing a deforming scene and tries to recover a complete 4D representation — geometry, appearance, motion — over the captured time window. The problem is fundamentally under-constrained at any one instant, and progress depends on how well the chosen scene representation and the supervision signals work together.

Yiqing Liang's PhD has driven this arc. Starting from semantic attention flow fields built atop a dynamic NeRF at ICCV 2023, the work moved to a forward-warping Gaussian deformation formulation (GauFRe, with Meta colleagues) for real-time rendering, then to a TMLR benchmark (MonoDyGauBench) that puts the recent flood of monocular dynamic Gaussian methods on a like-for-like footing. The latest paper (Zero-MSF, with NVIDIA) abandons per-scene optimization entirely and trains a feed-forward predictor for scene flow that generalizes zero-shot to in-the-wild video.

Authors

Abhishek Badki · Orazio Gallo · Leonidas J. Guibas · Adam Harley · Numair Khan · Eliot Laidlaw · Douglas Lanman · Yiqing Liang · Runfeng Li · Zhengqin Li · Alexander Meyerowitz · Thu Nguyen-Phuoc · Mikhail Okunev · Srinath Sridhar · Hang Su · Mikaela Angelina Uy · Lei Xiao

Papers in this thread

International Conference on Computer Vision (ICCV), 2023
Reconstructs a 4D neural volume carrying not just color and density but also scene flow, semantics, and attention, then uses the latter two to decompose foreground objects from background across spacetime without supervision.
arXiv (Dec.~2023) + WACV, 2025
Casts monocular dynamic reconstruction as a canonical Gaussian template plus a forward-warping deformation field, with a separate static component initialized to absorb non-moving regions so the deformation focuses on what actually moves. Trains in roughly twenty minutes and renders in real time.
Transactions on Machine Learning Research, 2025
An apples-to-apples benchmark of monocular dynamic Gaussian splatting methods, categorized by motion representation. Method differences are resolvable on synthetic data but get swamped by real-world scene complexity, and the optimization is uniformly brittle.
Computer Vision and Pattern Recognition (CVPR), 2025
A feed-forward model that jointly predicts geometry and scene flow, trained on a one-million-sample synthetic recipe. Generalizes zero-shot to casual DAVIS video and RoboTAP manipulation scenes — no per-scene optimization required.