We introduce FLAME [ ] to address a significant gap between highly accurate, photo-realistic, 3D face models of individuals that are learned from scans or images of that individual with the involvement of significant input from a 3D artist, and simple generic face models that can be fit to images, video, or RGB-D data but that lack realism. What is missing are generic 3D face models that are compact, can be fit to data, capture realistic 3D face details, and enable animation. Our goal is to move the “low end” models towards the “high end” by learning a model of facial shape and expression from 4D scans.
Since FLAME model learn a latent representation of a face using linear subspaces or higher-order tensor generalizations it is not great in capturing extreme deformations and non-linear expressions. To address this, we introduce a versatile model that learns a non-linear representation of a face using spectral convolutions on a mesh surface [ ]. It is well known that standard convolutional neural networks which have superior performences in 2D domain are not well defined on 3D meshes. Therefor we introduce mesh sampling operations that enable a hierarchical mesh representation that captures non-linear variations in shape and expression at multiple scales within the model. Our mesh convolution algorithm is generic.
After having linear and non-linear 3D face models we want to move on to more interesting problems like 3D face tracking from videos, behavioural and emotional analysis of face with speech and enabling 3D animation directly from videos. For that we are working on generting 3D faces directly from face images. Although we have access to a lot of 2D images in the real world, we lack correspondence in 2D-3D training data. Therefore we introduce an approach which can leverage huge amount of 2D data to generate 3D meshes without any 3D supervision.