I am a visual computing researcher—computer vision, computer graphics, and human-computer interaction. My lab develops techniques for image and video creation, editing, analysis, and interaction. This requires image and scene reconstruction techniques, especially from multi-camera systems and for complex dynamic scenes, and with applications on 2D, multi-view, and VR/AR displays.
How do we push neural reconstruction toward production-quality at the scales and fidelities real graphics applications need — large scenes, unstructured camera arrays, physically-faithful shading?
Neural scene representations (NeRFs, Gaussian splats) capture stunning visual fidelity, but typically work on small bounded scenes captured under controlled conditions. Pushing the same machinery toward production graphics requires solving practical bottlenecks: scaling to large indoor and outdoor scenes, handling unstructured (non-rig) capture, integrating physically-correct light transport, and stabilising the optimisation when geometry is otherwise ambiguous.
A long collaboration with Weiwei Xu and Hujun Bao at Zhejiang has pushed neural reconstruction at one bottleneck per year — distributed tile-MLPs for large indoor scenes (SNISR, SIGGRAPH 2022), bundle-adjusting NeRFs with ADMM consensus over tiles at large scale (ScaNeRF, SIGGRAPH Asia 2023), local Gaussian density mixtures for unstructured capture with curved-surface reflections (LGDM, SIGGRAPH Asia 2024), and differentiable area-light shading for material recovery (EOR, SIGGRAPH Asia 2025). Shape from Tracing (3DV 2020) sits at the head of the line: an early step that used differentiable path tracing — full global illumination, not just shading — as the forward model for joint geometry and SVBRDF recovery.
Hujun Bao · Bach-Thuan Bui · Dongyoung Choi · Jaemin Cho · Loudon Cohen · Zheng Dong · Michael Fairley · Yaoan Gao · Purvi Goel · James Guesman · Hyunho Ha · Qixing Huang · Hyeonjoong Jang · Woohyun Kang · Hakyeong Kim · Min H. Kim · Andreas Meuleman · Minh-Hieu Nguyen · Yifan Peng · Daniel Ritchie · Belal Shaheen · Yujun Shen · Shubham · Vikas Thamizharasan · Chi Wang · Huamin Wang · Qi Wang · Michael Wu · Tim Wu · Xiuchao Wu · Jiamin Xu · Weiwei Xu · Matthew David Zane · Xin Zhang · Zihan Zhu · Changqing Zou
When the input is only ordinary RGB video — no depth sensor, no rig — can we recover dynamic 3D scene geometry well enough to compete with depth-sensor-supervised methods?
Monocular dynamic 3D reconstruction takes a single moving camera observing a deforming scene and tries to recover a complete 4D representation — geometry, appearance, motion — over the captured time window. The problem is fundamentally under-constrained at any one instant, and progress depends on how well the chosen scene representation and the supervision signals work together.
Yiqing Liang's PhD has driven this arc. Starting from semantic attention flow fields built atop a dynamic NeRF at ICCV 2023, the work moved to a forward-warping Gaussian deformation formulation (GauFRe, with Meta colleagues) for real-time rendering, then to a TMLR benchmark (MonoDyGauBench) that puts the recent flood of monocular dynamic Gaussian methods on a like-for-like footing. The latest paper (Zero-MSF, with NVIDIA) abandons per-scene optimization entirely and trains a feed-forward predictor for scene flow that generalizes zero-shot to in-the-wild video.
Abhishek Badki · Orazio Gallo · Leonidas J. Guibas · Adam Harley · Numair Khan · Eliot Laidlaw · Douglas Lanman · Yiqing Liang · Runfeng Li · Zhengqin Li · Alexander Meyerowitz · Thu Nguyen-Phuoc · Mikhail Okunev · Srinath Sridhar · Hang Su · Mikaela Angelina Uy · Lei Xiao
Physical modelling of active illumination from raw sensor measurements can improve scene estimation and avoid errors from derived depth.
Time-of-flight and structured-light cameras are typically used as depth sensors: their raw measurements are processed into a per-pixel depth map, and downstream reconstruction methods treat that depth as input. But, depth processing often simplifies scene assumptions, creating noise in low-reflectance regions, flying pixels with multi-path interference, and motion artifacts in fast-moving scenes from requiring multiple illumination readings for depth estimates. Further, it is difficult to integrate these raw measurements with other sensor modalities, like colour cameras.
This line of work rethinks reconstruction for heterogeneous multi-shot imaging processes. Built upon a differentiable forward model of how the active illumination produces the raw sensor output for a given scene, these methods optimise a 4D volumetric scene representation (like NeRF or 3DGS) so that rendered measurements match what the sensor captured. This lets us principally integrate sensor measurements over spacetime, including across modalities, to reduce noise, resolve ambiguities in multi-shot sensing, and improve robustness to multi-path interference. And, as we model motion over time, then we can estimate and resample fast motion like swinging baseball bats to slow motion.
Benjamin Attal · Anh Duong · Aaron Gokaslan · Zixuan Guo · Changil Kim · Hakyeong Kim · Min H. Kim · Eliot Laidlaw · Runfeng Li · Marc Mapeke · Andreas Meuleman · Matthew O'Toole · Mikhail Okunev · Christian Richardt · Aarrushi Shandilya
How do we efficiently control generative models to produce what we want — preserving identity, 3D structure, style — without sacrificing quality?
A generative model that can sample new content is impressive; one that produces exactly what a user has in mind is useful. Controlling generation requires aligning the model's latent structure with axes a person can articulate — identity, pose, style, lighting, geometry — without sacrificing the photorealism that brought the model to relevance in the first place. There is usually a quality-versus-control tradeoff to manage.
The thread runs from Youssef Mejjati's PhD work on unsupervised attention for image-to-image translation, through compositional controls (object stamps, GaussiGAN's 3D Gaussian primitives from silhouettes alone), into 3DMM-conditioned face generation where Yiwen Huang's PhD now sits. Two recent moves matter: TaxFreeGAN closes the FID gap to unconditional StyleGAN under 3DMM conditioning, and the disentangling-3D work shows that the noise in CLIP's embedding space — not the disentanglement strategy — is what kills quality. R3GAN sits alongside this arc as the architectural reset: a principled relativistic loss that lets the modern GAN drop its bag of tricks.
Akin Caliskan · Darren Cosker · Aaron Gokaslan · Yiwen Huang · Berkay Kicanaoglu · Hyeongwoo Kim · Kwang In Kim · Atsunobu Kotani · Volodymyr Kuleshov · Youssef A. Mejjati · Isa Milefchik · Christian Richardt · Zejiang Shen · Michael Snower · Stefanie Tellex · Vikas Thamizharasan · Oliver Wang · Yue Wang · Xinjie Yi · Zhiqiu Yu · Qian Zhang
The light field is a 4D record of a scene's rays — how do we present it to humans, interact with it, and process it computationally?
A light field captures the radiance at every point in space, in every direction — a 4D function that fully describes how light fills a scene. Captured light fields enable refocusing, depth recovery, and parallax view synthesis; displayed light fields offer glasses-free 3D. The challenge is data density: 4D content stresses capture devices, display hardware, and processing pipelines.
Two sub-arcs sit in this thread. The first (2012–2015) targets light field displays — an Emerging Technologies demo of painting directly into a glasses-free 3D display, content-adaptive lenticular prints that reshape the lenslet array to the captured light field, and a UIST paper that turns that lenslet array into a joint display-and-pen-input surface. The second (2019–2021), led by Numair Khan with Min H. Kim at KAIST, develops dense algorithms over captured 4D content: view-consistent superpixels via epipolar-plane image segmentation, edge-aware bidirectional diffusion for depth, and a differentiable diffusion routine for sparse-to-dense depth from multi-view images.
Marc Alexa · Simon Heinzle · Stanislav Jakuschevskij · Lucas Kasser · Jan Kautz · Numair Khan · Min H. Kim · Wojciech Matusik · James McCann · Jim McCann · Samuel Muff · Hanspeter Pfister · Henry Stone · Qian Zhang
How do we let users edit captured video meaningfully — by first recovering the scene structure (moving objects, lighting vs. reflectance, cross-frame consistency) that makes plausible modifications possible?
Editing video is harder than editing a photograph: changes to one frame must propagate consistently to every other, and many edits (removing a person, separating lighting from material, stabilising flicker) require understanding the underlying scene rather than just manipulating pixels. The papers in this thread approach editing as inverse reconstruction: decompose video into scene structure first, then edit.
A postdoc-era thread spanning UCL, MPI-Inf, Harvard, and LIRIS-CNRS. The earliest piece (2011, UCL) is the cinemagraphs authoring tool — a moment image isolated from a stabilised clip. Miguel Granados led the video-inpainting work at MPI-Inf (2012) — removing dynamic objects from crowded scenes, and the harder case of background recovery under a free-moving camera. Nicolas Bonneel led the consistency and decomposition line (2014–2017) — interactive intrinsic decomposition, blind temporal consistency stabilising any per-frame filter, and the spatio-temporal extension to camera arrays. The 2016 multicut paper takes a different angle on the same theme: cut the video into the right regions before editing.
Bjoern Andres · Nicolas Bonneel · Miguel Granados · Oliver Grau · Jan Kautz · Kwang In Kim · Steffen Kirchhoff · Evgeny Levinkov · Sylvain Paris · Fabrizio Pece · Hanspeter Pfister · Kartic Subr · Kalyan Sunkavalli · Deqing Sun · Christian Theobalt · Oliver Wang
|
|
3D Vision,
2026
|
|
MDPI Remote Sensing,
2026
|
|
|
SIGGRAPH Asia,
2025
|
|
2025
|
|
|
Computer Vision and Pattern Recognition (CVPR),
2025
|
|
|
Computer Vision and Pattern Recognition (CVPR),
2025
|
|
|
SIGGRAPH Asia,
2024
|
|
Neural Information Processing Systems (NeurIPS),
2024
|
|
Transactions on Machine Learning Research,
2025
|
|
|
European Conference on Computer Vision (ECCV),
2024
|
|
Transactions on Visualization and Computer Graphics (IEEE Visualization short paper),
2024
|
|
Computer Vision and Pattern Recognition (CVPR),
2024
|
|
2024
Gives insight into why disentangling with CLIP is difficult—it's the prompt noise!
|
|
Winter Conference on Applications of Computer Vision (WACV) and AI for Content Creation (AI4CC) @ CVPR 2023,
2024
|
|
arXiv (Dec.~2023) + WACV,
2025
|
|
|
ACM Transactions on Graphics (SIGGRAPH Asia),
2023
|
|
|
International Conference on Computer Vision (ICCV),
2023
|
|
International Conference on Computer Vision (ICCV),
2023
|
|
International Journal of Computer Vision (IJCV),
2024
|
|
On Human-like Biases in CNNs for the Perception of Slant from Texture
ACM Transactions on Applied Perception,
2023
|
|
SIGCHI,
2023
|
|
Learning Vector Quantized Shape Codes for Amodal Blastomere Instance Segmentation
IEEE International Symposium on Biomedical Imaging (ISBI),
2023
|
|
|
ACM Transactions on Graphics (SIGGRAPH),
2022
|
|
Eurographics State of the Art Report + CVPR Tutorial + SIGGRAPH Course,
2022
|
|
European Conference on Computer Vision (ECCV),
2022
|
|
International Conference on Computational Photography (ICCP),
2022
|
|
SIGCHI,
2022
|
|
|
Computer Vision and Pattern Recognition (CVPR),
2022
|
|
Computers and Graphics,
2022
For recovering depth, this follows up Blind Video Spatio-Temporal Consistency and Blind Video Temporal Consistency.
|
|
Learning Physically-based Face Material and Lighting Decomposition
International Conference on Computational Visual Media,
2022
Also appeared at CVPR 2021 Workshop on AI for Content Creation
|
|
Transactions on Visualization and Computer Graphics,
2022
Hosted at the Open Science Foundation.
|
|
|
Advances in Neural Information Processing Systems (NeurIPS),
2021
|
|
International Conference on Computer Vision (ICCV),
2021
|
|
Computer Vision and Pattern Recognition (CVPR),
2021
|
|
BMVC 2021 and CVPR Workshop on AI for Content Creation,
2021
|
|
Human-Robot Interaction (Late Breaking Report),
2021
|
|
Transactions on Visualization and Computer Graphics (TVCG),
2021
|
|
European Conference on Computer Vision (ECCV),
2020
|
|
European Conference on Computer Vision (ECCV),
2020
|
|
BMVC,
2021
Fast 4D depth with accurate occlusion edges across two papers:
Edge-aware Bi-directional Diffusion for Dense Depth Estimation from Light Fields and View-consistent 4D Light Field Depth Estimation |
|
International Conference on 3D Vision (3DV),
2020
|
|
Real VR—Immersive Digital Reality,
2020
Chapter in the Real VR — Immersive Digital Reality Springer book; DOI.
|
|
CVPR Workshop on AI for Content Creation,
2020
Linked PDF is the full 8-page paper; the CVPRW version is 4 pages.
|
|
|
CVPR Workshop on Media Forensics,
2020
|
|
|
Transactions on Visualization and Computer Graphics (IEEE Visualization),
2020
|
|
MICCAI,
2020
|
|
|
International Conference on Computer Vision (ICCV),
2019
This work also produces an occlusion-aware piecewise planar scene reconstruction as a byproduct!
|
|
User Interface Software and Technology (UIST),
2019
|
|
VRCAI,
2019
|
|
|
SIGCHI,
2019
|
|
|
Transactions on Visualization and Computer Graphics (IEEE Visualization short paper),
2019
One-line SVG pan/zoom, plus a pan/zoom injecting bookmark for any SVG! The project page hosts docs, jsFiddle, and bl.ocks.org examples.
|
|
International Journal of Robotics Research,
2019
|
|
Neural Information Processing Systems (NeurIPS),
2018
|
|
European Conference on Computer Vision (ECCV),
2018
|
|
Transactions on Visualization and Computer Graphics (IEEE Visualization),
2018
Project page bundles the paper, code, and data.
|
|
Computer Vision and Pattern Recognition (CVPR),
2018
|
|
Computer Vision and Pattern Recognition (CVPR),
2018
|
|
ACM Symposium on Eye Tracking Research and Applications (ETRA),
2018
Project page bundles paper and code; dataset is hosted separately.
|
|
British Machine Vision Conference,
2017
|
|
Computer Graphics Forum (Eurographics),
2017
We could have called it Blind Video Spatio-Temporal Consistency as it follows up Blind Video Temporal Consistency.
|
|
International Conference on Computer Vision (ICCV),
2017
|
|
Conference on Human-Robot Interaction (HRI),
2017
|
|
International Symposium on Robotics Research,
2017
|
|
IEEE Visualization Workshop on Visual Analytics for Deep Learning,
2017
|
|
MDPI Informatics—Special Issue on Scalable Interactive Visualization,
2017
|
|
Transactions on Visualization and Computer Graphics (IEEE Visualization),
2016
|
|
Pacific Graphics 2016 (Short Paper),
2016
|
|
User Interface Software and Technology (UIST),
2015
Also at SIGGRAPH Emerging Technologies 2012: Interactive Light Field Painting
|
|
ACM Transactions on Graphics (SIGGRAPH Asia),
2015
Builds upon project: Direct Motion Mapping
|
|
ACM Transactions on Graphics (SIGGRAPH Asia),
2015
Related project: Blind Video Spatio-Temporal Consistency
|
|
ACM Transactions on Graphics (SIGGRAPH Asia),
2015
|
|
ACM Symposium on Computer Animation (SCA),
2015
|
|
Computer Vision and Pattern Recognition (CVPR),
2015
|
|
International Conference on Computer Vision (ICCV),
2015
|
|
Computer Vision and Pattern Recognition (CVPR),
2015
|
|
ACM Transactions on Graphics (SIGGRAPH Asia),
2014
|
|
Transactions on Pattern Analysis and Machine Intelligence (TPAMI),
2014
|
|
European Conference on Visual Media Production (CVMP),
2014
Related project: Vidicontexts
|
|
Computer Graphics Forum (Eurographics),
2014
Related project: Generalized Wave Gestures
|
|
ACM Transactions on Graphics (SIGGRAPH Asia),
2013
|
|
International Conference on Computer Vision (ICCV),
2013
|
|
|
User Interface Software and Technology (UIST),
2013
Related study into display device effect: Device Effect on Panoramic Video+Context Tasks
|
|
ACM Transactions on Applied Perception (TAP),
2013
|
|
ACM Transactions on Graphics (SIGGRAPH),
2013
Printing light field displays with varying spatio-angular resolution.
|
|
EngD Thesis @ University College London,
2013
Also archived at UCL Discovery. Related projects: Videoscapes, Transition Analysis, Match Graph Construction.
|
|
European Conference on Visual Media Production (CVMP),
2012
Alt title: Light Field Video Textures
|
|
European Conference on Computer Vision (ECCV),
2012
Project page includes the dataset.
|
|
European Conference on Computer Vision (ECCV),
2012
Useful for building correspondence graphs for image matching, e.g., in search or large-scale reconstruction. Supplemental material.
|
|
ACM Transactions on Graphics (SIGGRAPH),
2012
|
|
SIGGRAPH Emerging Technologies,
2012
Early demo of our later UIST 2015 publication Joint 5D Pen Input for Light Field Displays. Demo project page also at MIT CDFG.
|
|
Computer Graphics Forum (Eurographics),
2012
Project page includes the dataset.
|
|
Computer Graphics Forum (Eurographics),
2012
Project page includes code and data.
|
|
ACM Transactions on Graphics (SIGGRAPH),
2011
|
|
European Conference on Visual Media Production (CVMP),
2011
|
|
International Brain-Computer Interface Conference (BCI),
2011
|
|
ACM Transactions on Computer-Human Interaction (SIGCHI),
2010
|
|
British HCI Group Annual Conference on People and Computers (BCS-HCI),
2009
Webpage contains many projects and events! Schematics and WebGL model viewer!
|
|
MSci Dissertation @ King's College, London,
2006
|
AI for Content Creation
CVPR 2019–2025 Workshop
Physics-inspired 3D Vision and Imaging
CVPR 2025 Workshop
Neural Fields Beyond Conventional Cameras
ECCV 2024 Workshop
Neural Fields in Visual Computing
CVPR 2022 Tutorial + SIGGRAPH 2023 Course
New England Compter Vision Symposium
Brown 2019
Video for Virtual Reality
SIGGRAPH 2017 Course
User-centric Computational Videography
SIGGRAPH 2015 Course
CSCI 1430—Introduction to Computer Vision
Brown University
2016–now.
CSCI 1290—Computational Photography
Brown University
2018–now.
CSCI 2951-I—Computer Vision for Graphics and Interaction
Brown University
2016–now.
CSCI 2000—Computer Science Research Methods or How to be a CS PhD Student
Brown University
2021 Fall.
CSCI 1950-N—2D Game Engines
Brown University
2017–now. Mentoring student-led course.
GISP 0002—NFTs, Blockchain, and Art, led by Ally Zhu and Nikolas Lazar
Brown University
2022 Spring.
CS171—Visualization
Harvard University
2016 Spring, 2015 Spring.
Computer Vision for Computer Graphics
Max-Planck-Institute for Informatics
2013 Summer.
|
2025–
|
|
Yiwen (Nick) Huang
2024–
|
|
2022–
|
|
2021–
|
|
2021–2025
Onto: Luma AI Research Scientist
|
|
2018–
|
|
2016–2021
Onto: Meta Reality Labs Research Scientist
Publications:
|
|
Anika Bahl
2023–2024
Onto:
|
|
Troy Conklin
2022–2024
Onto: General Dynamics
|
|
2021–2023
Onto: CMU Research Masters | UW PhD
|
|
2022–2023
Onto: Harvard Data Science Masters
|
|
2021–2023
Onto: Harvard Computational Science and Engineering Masters
|
|
2020–2022
Onto: Common Sense Machines
|
|
2017–2020
Onto: UC Berkeley PhD
|
|
2019–2021
Onto: Common Sense Machines
|
|
Henry Stone
2018–2020
|
|
Lucas Kasser
2018–2019
|
|
2017–2018
Onto: Allen Institute for AI Residency; UWashington PhD
|
|
2022–2025
Onto:
|
|
2021–2025
Onto: Alibaba
|
|
2018–2021
Onto: Synthesia
|
|
2014–2020
Onto: Google
Publications:
|
|
2014–2019
Onto: UMass Boston Faculty
|
|
2014–2017
Onto: RealityDefender
|
James Tompkin is an Associate Professor of Computer Science at Brown University. His research at the intersection of computer vision, computer graphics, and human-computer interaction helps develop new visual computing tools and experiences from cameras. For this, his lab creates techniques for 3D scene reconstruction from multi-camera systems and for dynamics. His doctoral work at University College London on large-scale video processing and exploration techniques led to creative exhibition work in the Museum of the Moving Image in New York City. Postdoctoral work at Max-Planck-Institute for Informatics and Harvard University helped create new methods to edit content within images and videos. Recent research has developed new techniques for low-level reconstruction of dynamic scenes, view synthesis for VR, and AI content editing and generation.
Please find my research summary video from 2015—our newer lab work is on the 'Research' tab.
I supported SIGGRAPH's 50th conference in 2023 as the chair of the Posters program, which was coincidentally running its 20th iteration too. Here's a meta-poster about the program's history and its outstanding contributors (low-res PNG).
I supported the Discover program and club at Brown/RISD to pair arts and science students and put on an exhibition (2017--2021). I have also tried to contribute myself.
Bad Art @ Brown, 2018
with Aaron Gokaslan and Vivek Ramanujan
Rear Window Augmented
with Jeff Desom
Museum of the Moving Image
New York City
7th Nov. 2015 to 10th April 2016
ISCP
New York City
7–9th November 2014
Festival Imaginales
Epinal, France
26–29th May 2014
Luxembourg Film Festival
28th February to 9th March 2014