Bodies in computer vision have often been an afterthought. Human pose is often represented by 10-12 body joints in 2D or 3D. This is inspired by Johannson's moving light displays, which showed that some human actions can be recognized from the motion of the major joints of the body. But the joints do not capture everything. The skeletal structure of the body is also a popular representation but is only approximate and is never actually observed in images.
In our work we have focused on 3D body shape, represented as a triangulated mesh. Shape gives us more information about a person related to their health, age, fitness, and clothing size. But shape is also useful because our body surface is critical to our physical interactions with the world. We cannot interpenetrate objects and they cannot interpenetrate us.
It has taken a few years for the field to catch on to this idea but now our SMPL [ ] body model is widely used in research and industry. It is simple, efficient, posable, and compatible with most graphics packages. It is also differentiable and easy to integrate into optimization or deep learning methods.
While popular, SMPL has drawbacks. Pose deformations are non-local, the face does not move, the hands are rigid, there is no clothing and no hair. We are addressing these issues in ongoing work (see the theme on Clothing [ ] and projects on Faces [ ] and Hands [ ]). Our recent work [ ] is putting bodies, faces and hands together in a simple model, called SMPL-X, that can be fit to data or animated. Like all our body models, we train this from scans of people to capture the realism and statistics of the population [ ]. Our latest work [ ] trains a deep learning model, called ExPose, that reconstructs expressive 3D humans (SMPL-X) from a single RGB image quickly and accurately.
Such models provide the foundation for our analysis of human movement, emotion, and behavior.