Editing Video by Recovering Scene Structure

How can we edit captured video? By recovering the scene structure (geometry, dynamics, lighting, reflectance, cross-frame consistency) that makes plausible modifications possible.

Editing video is harder than editing a photograph: changes to one frame must propagate consistently to every other, and many edits (removing a person, separating lighting from material, stabilising flicker) require understanding the underlying scene rather than just manipulating pixels. We approach editing as inverse reconstruction: decompose video into scene structure first, then edit.

This thread spans my doctoral and postdoc years across UCL, MPI-Inf, Harvard, and LIRIS-CNRS. My earliest piece (2011, UCL) is the cinemagraphs authoring tool—a moment image isolated from a stabilised clip. Miguel Granados led the video-inpainting work at MPI-Inf (2012)—removing dynamic objects from crowded scenes, and the harder case of background recovery under a free-moving camera. Nicolas Bonneel led the consistency and decomposition line (2014–2017)—interactive intrinsic decomposition, blind temporal consistency stabilising any per-frame filter, and the spatio-temporal extension to camera arrays. Our 2016 multicut paper takes a different angle on the same theme: cut the video into the right regions before editing.

Authors

Bjoern Andres · Nicolas Bonneel · Miguel Granados · Oliver Grau · Jan Kautz · Kwang In Kim · Steffen Kirchhoff · Evgeny Levinkov · Sylvain Paris · Fabrizio Pece · Hanspeter Pfister · Kartic Subr · Kalyan Sunkavalli · Deqing Sun · Christian Theobalt · Oliver Wang

Papers in this thread

Towards Moment Imagery: Automatic Cinemagraphs

European Conference on Visual Media Production (CVMP), 2011

An authoring tool that pipelines stabilisation, segmentation, motion selection, and loop detection to produce cinemagraphs—short looping clips where only a chosen region moves.

How Not to Be Seen — Object Removal from Videos of Crowded Scenes

Computer Graphics Forum (Eurographics), 2012

Object removal from crowded scenes by filling the spatio-temporal hole from other regions of the video where the occluded background was visible, posed as a graph-cut optimisation. Pitched at occlusions harder than previous work had attempted.

Background Inpainting for Videos with Dynamic Objects and a Free-moving Camera

European Conference on Computer Vision (ECCV), 2012

Inpaints background revealed by removing dynamic objects from a free-moving-camera video by aligning candidate frames with piecewise planar homographies—sidestepping the full per-frame depth and pose recovery that earlier free-camera methods required.

Interactive Intrinsic Video Editing

Transactions on Graphics (SIGGRAPH Asia), 2014

Decomposes video into reflectance and illumination via a hybrid L2-Lp gradient split, fast enough (two orders of magnitude over prior tools) to support interactive refinement and lighting-aware compositing.

Blind Video Temporal Consistency

Transactions on Graphics (SIGGRAPH Asia), 2015

A gradient-domain post-process that stabilises any per-frame filter against flicker by borrowing temporal regularity from the unprocessed video—agnostic to what the filter actually is. Demonstrated across stylisation, intrinsic decomposition, and depth.

Interactive Multicut Video Segmentation

Pacific Graphics (Short Paper), 2016

Interactive multi-label video segmentation from multi-coloured scribbles, posed as a multicut on a supervoxel graph and solved fast enough to feel responsive. Multiple objects cut at once with consistent spatio-temporal boundaries, rather than chained binary segmentations.

Consistent Video Filtering for Camera Arrays

Computer Graphics Forum (Eurographics), 2017

Extends the blind-consistency idea from time to time-and-space across stereo, light field, and wide-baseline rigs, and adds a filter-transfer scheme that runs the expensive filter on a small subset of frames and propagates the effect—an order-of-magnitude saving for camera-array data.

Associate Professor

Visual Computing

Contact