Physical modelling of active illumination from raw sensor measurements can improve scene estimation and avoid errors from derived depth.
Time-of-flight and structured-light cameras are typically used as depth sensors: their raw measurements are processed into a per-pixel depth map, and downstream reconstruction methods treat that depth as input. But, depth processing often simplifies scene assumptions, creating noise in low-reflectance regions, flying pixels with multi-path interference, and motion artifacts in fast-moving scenes from requiring multiple illumination readings for depth estimates. Further, it is difficult to integrate these raw measurements with other sensor modalities, like colour cameras.
This line of work rethinks reconstruction for heterogeneous multi-shot imaging processes. Built upon a differentiable forward model of how the active illumination produces the raw sensor output for a given scene, these methods optimise a 4D volumetric scene representation (like NeRF or 3DGS) so that rendered measurements match what the sensor captured. This lets us principally integrate sensor measurements over spacetime, including across modalities, to reduce noise, resolve ambiguities in multi-shot sensing, and improve robustness to multi-path interference. And, as we model motion over time, then we can estimate and resample fast motion like swinging baseball bats to slow motion.
Benjamin Attal · Eliot Laidlaw · Aaron Gokaslan · Changil Kim · Christian Richardt · Matthew O'Toole · Mikhail Okunev · Marc Mapeke · Runfeng Li · Zixuan Guo · Anh Duong · Aarrushi Shandilya · Andreas Meuleman · Hakyeong Kim · Min H. Kim