FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models

FitDiff, a versatile multi-modal diffusion model, produces relightable facial avatars that seamlessly integrate into various commercial rendering platforms.
Given "in-the-wild" facial images, FitDiff reconstructs facial avatars consisting of facial shape and reflectance.


Abstract

In this work, we present FitDiff, a diffusion-based 3D facial avatar generative model. This model accurately generates relightable facial avatars, utilizing an identity embedding extracted from an "in-the-wild" 2D facial image.

Our multi-modal diffusion model concurrently outputs facial reflectance maps(albedo, specular, and normals) and shapes, showcasing great generalization capabilities. It is solely trained on an annotated subset of a public facial dataset, paired with 3D reconstructions. We revisit the typical 3D facial fitting approach by guiding a reverse diffusion process using perceptual and face recognition losses.

Being the first LDM conditioned on face recognition embeddings, FitDiff reconstructs relightable human avatars, that can be used as-is in typical rendering engines, starting only from a single unconstrained facial image while achieving state-of-the-art performance.


Method

Starting from Gaussian noise, FitDiff concurently generates facial shape and reflectance maps (diffuse albedo, specular albedo and normals), conditioned on an identity embedding vector. During sampling, a novel guidance algorithm is applied for further control of the resulting facial avatar. ZT,Zk and Zk−1 are visualized in the actual picture space for illustration purposes

Unconditional Sampling


FitDiff can generate diverse facial identities without the need for pre-existing input. These assets offer significant potential across diverse applications, including enhancing existing datasets through augmentation and enrichment, as well as the creation of genuinely random identities for computer-based applications.

Stathis Galanakis
Stathis Galanakis
PhD Student in Imperial College London

My research interests include techniques for 3D face reconstruction from a monocular image.