James Tompkin

Associate Professor

Visual Computing

 BlueSky @brownvc.bsky.social
 Github @brownvc

Brown student researcher?
Group Onboarding Process

Contact


 BlueSky @jamestompkin.bsky.social
Google Scholar

Office hours: Weds 1300 EST
Book appointment

Brown folks: Save an email,
use GCal 'Find a Time'
and include an agenda. Instructions

Center for Information Technology
Room 547
115 Waterman Street
Providence, RI, 02912


Acknowledgements

My intrepid collaborators and co-authors.

Funding:

  • US NSF, DARPA, NASA
  • UK EPSRC, BBC
  • Industry Activision, Adobe, Amazon, Cognex, Google, Intel, Meta, Snap, AI Foundation

The open source Web com­munity: HTML5 Boiler­plate, Ryan Johnston, Joshua N. Hibbert, Practical­Typo­graphy.com, EB Gara­mond.

Hosted on GitHub Pages using Jekyll — basic theme by orderedlist.

James Tompkin

Associate Professor

Visual Computing

 BlueSky @brownvc.bsky.social
 Github @brownvc

Brown student researcher?
Group Onboarding Process

Contact


 BlueSky @jamestompkin.bsky.social
Google Scholar

Office hours: Weds 1300 EST
Book appointment

Brown folks: Save an email,
use GCal 'Find a Time'
and include an agenda. Instructions

Center for Information Technology
Room 547
115 Waterman Street
Providence, RI, 02912


Acknowledgements

My intrepid collaborators and co-authors.

Funding:

  • US NSF, DARPA, NASA
  • UK EPSRC, BBC
  • Industry Activision, Adobe, Amazon, Cognex, Google, Intel, Meta, Snap, AI Foundation

The open source Web com­munity: HTML5 Boiler­plate, Ryan Johnston, Joshua N. Hibbert, Practical­Typo­graphy.com, EB Gara­mond.

Hosted on GitHub Pages using Jekyll — basic theme by orderedlist.


← Back to homepage

Controllable Generative Models

How do we efficiently control generative models to produce what we want — preserving identity, 3D structure, style — without sacrificing quality?

A generative model that can sample new content is impressive; one that produces exactly what a user has in mind is useful. Controlling generation requires aligning the model's latent structure with axes a person can articulate — identity, pose, style, lighting, geometry — without sacrificing the photorealism that brought the model to relevance in the first place. There is usually a quality-versus-control tradeoff to manage.

The thread runs from Youssef Mejjati's PhD work on unsupervised attention for image-to-image translation, through compositional controls (object stamps, GaussiGAN's 3D Gaussian primitives from silhouettes alone), into 3DMM-conditioned face generation where Yiwen Huang's PhD now sits. Two recent moves matter: TaxFreeGAN closes the FID gap to unconditional StyleGAN under 3DMM conditioning, and the disentangling-3D work shows that the noise in CLIP's embedding space — not the disentanglement strategy — is what kills quality. R3GAN sits alongside this arc as the architectural reset: a principled relativistic loss that lets the modern GAN drop its bag of tricks.

Authors

Akin Caliskan · Darren Cosker · Aaron Gokaslan · Yiwen Huang · Berkay Kicanaoglu · Hyeongwoo Kim · Kwang In Kim · Atsunobu Kotani · Volodymyr Kuleshov · Youssef A. Mejjati · Isa Milefchik · Christian Richardt · Zejiang Shen · Michael Snower · Stefanie Tellex · Vikas Thamizharasan · Oliver Wang · Yue Wang · Xinjie Yi · Zhiqiu Yu · Qian Zhang

Papers in this thread

Unsupervised Attention-guided Image to Image Translation
Neural Information Processing Systems (NeurIPS), 2018
Jointly trains attention with generators and discriminators so unsupervised image-to-image translation can localize edits to objects without disturbing background or inter-object structure.
European Conference on Computer Vision (ECCV), 2020
Factors handwriting style into separate character-level and writer-level descriptors, letting the model generate new characters in a held-out writer's hand from only a few samples.
CVPR Workshop on AI for Content Creation, 2020
Splits conditional object insertion into a mask generator (shape, given a class and bounding box) and a texture generator (appearance, conditioned on the background), so the inserted object is both diverse in shape and consistent with its surroundings.
BMVC 2021 and CVPR Workshop on AI for Content Creation, 2021
Learns a coarse 3D object representation as a set of self-supervised anisotropic 3D Gaussians from unposed 2D masks alone, then uses it to drive controllable mask and texture synthesis with interactive posing.
Learning Physically-based Face Material and Lighting Decomposition
International Conference on Computational Visual Media, 2022
Estimates per-portrait surface normals, albedo, roughness, and a high-frequency lighting map, and decomposes diffuse and specular reflectance — so a downstream editor can relight a face from a single photograph.
Winter Conference on Applications of Computer Vision (WACV) and AI for Content Creation (AI4CC) @ CVPR 2023, 2024
Formalizes 3DMM-conditioned face generation as a math problem, then applies targeted fixes that close the FID gap to unconditional StyleGAN — so controllability no longer costs visible image quality.
Disentangling 3D from Large Vision-Language Models for Controlled Portrait Generation
2024
Disentangles 3D portrait generation from a frozen CLIP plus a FLAME morphable model, then identifies CLIP's noisy embedding directions as the residual source of entanglement and damps them with a stochastic Jacobian regularizer.
The GAN is Dead; Long Live the GAN! A Modern GAN Baseline
Neural Information Processing Systems (NeurIPS), 2024
A regularized relativistic GAN loss with proven local convergence lets a minimalist StyleGAN2-derived architecture — stripped of the usual stabilization tricks — beat StyleGAN2 on FFHQ, ImageNet, CIFAR, and Stacked MNIST, and compete with diffusion models.