How do we let users edit captured video meaningfully — by first recovering the scene structure (moving objects, lighting vs. reflectance, cross-frame consistency) that makes plausible modifications possible?
Editing video is harder than editing a photograph: changes to one frame must propagate consistently to every other, and many edits (removing a person, separating lighting from material, stabilising flicker) require understanding the underlying scene rather than just manipulating pixels. The papers in this thread approach editing as inverse reconstruction: decompose video into scene structure first, then edit.
A postdoc-era thread spanning UCL, MPI-Inf, Harvard, and LIRIS-CNRS. The earliest piece (2011, UCL) is the cinemagraphs authoring tool — a moment image isolated from a stabilised clip. Miguel Granados led the video-inpainting work at MPI-Inf (2012) — removing dynamic objects from crowded scenes, and the harder case of background recovery under a free-moving camera. Nicolas Bonneel led the consistency and decomposition line (2014–2017) — interactive intrinsic decomposition, blind temporal consistency stabilising any per-frame filter, and the spatio-temporal extension to camera arrays. The 2016 multicut paper takes a different angle on the same theme: cut the video into the right regions before editing.
Bjoern Andres · Nicolas Bonneel · Miguel Granados · Oliver Grau · Jan Kautz · Kwang In Kim · Steffen Kirchhoff · Evgeny Levinkov · Sylvain Paris · Fabrizio Pece · Hanspeter Pfister · Kartic Subr · Kalyan Sunkavalli · Deqing Sun · Christian Theobalt · Oliver Wang