Header logo is ps


2017


Thumb xl flamewebteaserwide
Learning a model of facial shape and expression from 4D scans

Li, T., Bolkart, T., Black, M. J., Li, H., Romero, J.

ACM Transactions on Graphics, 36(6):194:1-194:17, November 2017, Two first authors contributed equally (article)

Abstract
The field of 3D face modeling has a large gap between high-end and low-end methods. At the high end, the best facial animation is indistinguishable from real humans, but this comes at the cost of extensive manual labor. At the low end, face capture from consumer depth sensors relies on 3D face models that are not expressive enough to capture the variability in natural facial shape and expression. We seek a middle ground by learning a facial model from thousands of accurately aligned 3D scans. Our FLAME model (Faces Learned with an Articulated Model and Expressions) is designed to work with existing graphics software and be easy to fit to data. FLAME uses a linear shape space trained from 3800 scans of human heads. FLAME combines this linear shape space with an articulated jaw, neck, and eyeballs, pose-dependent corrective blendshapes, and additional global expression from 4D face sequences in the D3DFACS dataset along with additional 4D sequences.We accurately register a template mesh to the scan sequences and make the D3DFACS registrations available for research purposes. In total the model is trained from over 33, 000 scans. FLAME is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model. We compare FLAME to these models by fitting them to static 3D scans and 4D sequences using the same optimization method. FLAME is significantly more accurate and is available for research purposes (http://flame.is.tue.mpg.de).

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]

2017

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]


Thumb xl molbert
Investigating Body Image Disturbance in Anorexia Nervosa Using Novel Biometric Figure Rating Scales: A Pilot Study

Mölbert, S. C., Thaler, A., Streuber, S., Black, M. J., Karnath, H., Zipfel, S., Mohler, B., Giel, K. E.

European Eating Disorders Review, 25(6):607-612, November 2017 (article)

Abstract
This study uses novel biometric figure rating scales (FRS) spanning body mass index (BMI) 13.8 to 32.2 kg/m2 and BMI 18 to 42 kg/m2. The aims of the study were (i) to compare FRS body weight dissatisfaction and perceptual distortion of women with anorexia nervosa (AN) to a community sample; (ii) how FRS parameters are associated with questionnaire body dissatisfaction, eating disorder symptoms and appearance comparison habits; and (iii) whether the weight spectrum of the FRS matters. Women with AN (n = 24) and a community sample of women (n = 104) selected their current and ideal body on the FRS and completed additional questionnaires. Women with AN accurately picked the body that aligned best with their actual weight in both FRS. Controls underestimated their BMI in the FRS 14–32 and were accurate in the FRS 18–42. In both FRS, women with AN desired a body close to their actual BMI and controls desired a thinner body. Our observations suggest that body image disturbance in AN is unlikely to be characterized by a visual perceptual disturbance, but rather by an idealization of underweight in conjunction with high body dissatisfaction. The weight spectrum of FRS can influence the accuracy of BMI estimation.

publisher DOI Project Page [BibTex]


Thumb xl manoteaser
Embodied Hands: Modeling and Capturing Hands and Bodies Together

Romero, J., Tzionas, D., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):245:1-245:17, 245:1–245:17, ACM, November 2017 (article)

Abstract
Humans move their hands and bodies together to communicate and solve tasks. Capturing and replicating such coordinated activity is critical for virtual characters that behave realistically. Surprisingly, most methods treat the 3D modeling and tracking of bodies and hands separately. Here we formulate a model of hands and bodies interacting together and fit it to full-body 4D sequences. When scanning or capturing the full body in 3D, hands are small and often partially occluded, making their shape and pose hard to recover. To cope with low-resolution, occlusion, and noise, we develop a new model called MANO (hand Model with Articulated and Non-rigid defOrmations). MANO is learned from around 1000 high-resolution 3D scans of hands of 31 subjects in a wide variety of hand poses. The model is realistic, low-dimensional, captures non-rigid shape changes with pose, is compatible with standard graphics packages, and can fit any human hand. MANO provides a compact mapping from hand poses to pose blend shape corrections and a linear manifold of pose synergies. We attach MANO to a standard parameterized 3D body shape model (SMPL), resulting in a fully articulated body and hand model (SMPL+H). We illustrate SMPL+H by fitting complex, natural, activities of subjects captured with a 4D scanner. The fitting is fully automatic and results in full body models that move naturally with detailed hand motions and a realism not seen before in full body performance capture. The models and data are freely available for research purposes at http://mano.is.tue.mpg.de.

website youtube paper suppl video link (url) DOI Project Page [BibTex]

website youtube paper suppl video link (url) DOI Project Page [BibTex]


Thumb xl cover tro paper
An Online Scalable Approach to Unified Multirobot Cooperative Localization and Object Tracking

Ahmad, A., Lawless, G., Lima, P.

IEEE Transactions on Robotics (T-RO), 33, pages: 1184 - 1199, October 2017 (article)

Abstract
In this article we present a unified approach for multi-robot cooperative simultaneous localization and object tracking based on particle filters. Our approach is scalable with respect to the number of robots in the team. We introduce a method that reduces, from an exponential to a linear growth, the space and computation time requirements with respect to the number of robots in order to maintain a given level of accuracy in the full state estimation. Our method requires no increase in the number of particles with respect to the number of robots. However, in our method each particle represents a full state hypothesis, leading to the linear dependency on the number of robots of both space and time complexity. The derivation of the algorithm implementing our approach from a standard particle filter algorithm and its complexity analysis are presented. Through an extensive set of simulation experiments on a large number of randomized datasets, we demonstrate the correctness and efficacy of our approach. Through real robot experiments on a standardized open dataset of a team of four soccer playing robots tracking a ball, we evaluate our method's estimation accuracy with respect to the ground truth values. Through comparisons with other methods based on i) nonlinear least squares minimization and ii) joint extended Kalman filter, we further highlight our method's advantages. Finally, we also present a robustness test for our approach by evaluating it under scenarios of communication and vision failure in teammate robots.

Published Version link (url) DOI [BibTex]


Thumb xl provisional
Parameterized Model of 2D Articulated Human Shape

Black, M. J., Freifeld, O., Weiss, A., Loper, M., Guan, P.

September 2017, U.S.~Patent 9,761,060 (misc)

Abstract
Disclosed are computer-readable devices, systems and methods for generating a model of a clothed body. The method includes generating a model of an unclothed human body, the model capturing a shape or a pose of the unclothed human body, determining two-dimensional contours associated with the model, and computing deformations by aligning a contour of a clothed human body with a contour of the unclothed human body. Based on the two-dimensional contours and the deformations, the method includes generating a first two-dimensional model of the unclothed human body, the first two-dimensional model factoring the deformations of the unclothed human body into one or more of a shape variation component, a viewpoint change, and a pose variation and learning an eigen-clothing model using principal component analysis applied to the deformations, wherein the eigen-clothing model classifies different types of clothing, to yield a second two-dimensional model of a clothed human body.

Google Patents [BibTex]


Thumb xl bodytalk
Crowdshaping Realistic 3D Avatars with Words

Streuber, S., Ramirez, M. Q., Black, M., Zuffi, S., O’Toole, A., Hill, M. Q., Hahn, C. A.

August 2017, Application PCT/EP2017/051954 (misc)

Abstract
A method for generating a body shape, comprising the steps: - receiving one or more linguistic descriptors related to the body shape; - retrieving an association between the one or more linguistic descriptors and a body shape; and - generating the body shape, based on the association.

Google Patents [BibTex]

Google Patents [BibTex]


Thumb xl dapepatent
System and method for simulating realistic clothing

Black, M. J., Guan, P.

June 2017, U.S.~Patent 9,679,409 B2 (misc)

Abstract
Systems, methods, and computer-readable storage media for simulating realistic clothing. The system generates a clothing deformation model for a clothing type, wherein the clothing deformation model factors a change of clothing shape due to rigid limb rotation, pose-independent body shape, and pose-dependent deformations. Next, the system generates a custom-shaped garment for a given body by mapping, via the clothing deformation model, body shape parameters to clothing shape parameters. The system then automatically dresses the given body with the custom- shaped garment.

Google Patents pdf [BibTex]


Thumb xl early stopping teaser
Early Stopping Without a Validation Set

Mahsereci, M., Balles, L., Lassner, C., Hennig, P.

arXiv preprint arXiv:1703.09580, 2017 (article)

Abstract
Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.

link (url) Project Page Project Page [BibTex]


Thumb xl web image
Data-Driven Physics for Human Soft Tissue Animation

Kim, M., Pons-Moll, G., Pujades, S., Bang, S., Kim, J., Black, M. J., Lee, S.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4):54:1-54:12, 2017 (article)

Abstract
Data driven models of human poses and soft-tissue deformations can produce very realistic results, but they only model the visible surface of the human body and cannot create skin deformation due to interactions with the environment. Physical simulations can generalize to external forces, but their parameters are difficult to control. In this paper, we present a layered volumetric human body model learned from data. Our model is composed of a data-driven inner layer and a physics-based external layer. The inner layer is driven with a volumetric statistical body model (VSMPL). The soft tissue layer consists of a tetrahedral mesh that is driven using the finite element method (FEM). Model parameters, namely the segmentation of the body into layers and the soft tissue elasticity, are learned directly from 4D registrations of humans exhibiting soft tissue deformations. The learned two layer model is a realistic full-body avatar that generalizes to novel motions and external forces. Experiments show that the resulting avatars produce realistic results on held out sequences and react to external forces. Moreover, the model supports the retargeting of physical properties from one avatar when they share the same topology.

video paper link (url) Project Page [BibTex]

video paper link (url) Project Page [BibTex]


Thumb xl web teaser eg
Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

(Best Paper, Eurographics 2017)

Marcard, T. V., Rosenhahn, B., Black, M., Pons-Moll, G.

Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), pages: 349-360 , 2017 (article)

Abstract
We address the problem of making human motion capture in the wild more practical by using a small set of inertial sensors attached to the body. Since the problem is heavily under-constrained, previous methods either use a large number of sensors, which is intrusive, or they require additional video input. We take a different approach and constrain the problem by: (i) making use of a realistic statistical body model that includes anthropometric constraints and (ii) using a joint optimization framework to fit the model to orientation and acceleration measurements over multiple frames. The resulting tracker Sparse Inertial Poser (SIP) enables motion capture using only 6 sensors (attached to the wrists, lower legs, back and head) and works for arbitrary human motions. Experiments on the recently released TNT15 dataset show that, using the same number of sensors, SIP achieves higher accuracy than the dataset baseline without using any video data. We further demonstrate the effectiveness of SIP on newly recorded challenging motions in outdoor scenarios such as climbing or jumping over a wall

video pdf Project Page [BibTex]

video pdf Project Page [BibTex]


Thumb xl pami 2017 teaser
Efficient 2D and 3D Facade Segmentation using Auto-Context

Gadde, R., Jampani, V., Marlet, R., Gehler, P.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017 (article)

Abstract
This paper introduces a fast and efficient segmentation technique for 2D images and 3D point clouds of building facades. Facades of buildings are highly structured and consequently most methods that have been proposed for this problem aim to make use of this strong prior information. Contrary to most prior work, we are describing a system that is almost domain independent and consists of standard segmentation methods. We train a sequence of boosted decision trees using auto-context features. This is learned using stacked generalization. We find that this technique performs better, or comparable with all previous published methods and present empirical results on all available 2D and 3D facade benchmark datasets. The proposed method is simple to implement, easy to extend, and very efficient at test-time inference.

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


Thumb xl web image
ClothCap: Seamless 4D Clothing Capture and Retargeting

Pons-Moll, G., Pujades, S., Hu, S., Black, M.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4):73:1-73:15, ACM, New York, NY, USA, 2017, Two first authors contributed equally (article)

Abstract
Designing and simulating realistic clothing is challenging and, while several methods have addressed the capture of clothing from 3D scans, previous methods have been limited to single garments and simple motions, lack detail, or require specialized texture patterns. Here we address the problem of capturing regular clothing on fully dressed people in motion. People typically wear multiple pieces of clothing at a time. To estimate the shape of such clothing, track it over time, and render it believably, each garment must be segmented from the others and the body. Our ClothCap approach uses a new multi-part 3D model of clothed bodies, automatically segments each piece of clothing, estimates the naked body shape and pose under the clothing, and tracks the 3D deformations of the clothing over time. We estimate the garments and their motion from 4D scans; that is, high-resolution 3D scans of the subject in motion at 60 fps. The model allows us to capture a clothed person in motion, extract their clothing, and retarget the clothing to new body shapes. ClothCap provides a step towards virtual try-on with a technology for capturing, modeling, and analyzing clothing in motion.

video project_page paper link (url) DOI Project Page Project Page [BibTex]

video project_page paper link (url) DOI Project Page Project Page [BibTex]

2016


Thumb xl smpl
Skinned multi-person linear model

Black, M.J., Loper, M., Mahmood, N., Pons-Moll, G., Romero, J.

December 2016, Application PCT/EP2016/064610 (misc)

Abstract
The invention comprises a learned model of human body shape and pose dependent shape variation that is more accurate than previous models and is compatible with existing graphics pipelines. Our Skinned Multi-Person Linear model (SMPL) is a skinned vertex based model that accurately represents a wide variety of body shapes in natural human poses. The parameters of the model are learned from data including the rest pose template, blend weights, pose-dependent blend shapes, identity- dependent blend shapes, and a regressor from vertices to joint locations. Unlike previous models, the pose-dependent blend shapes are a linear function of the elements of the pose rotation matrices. This simple formulation enables training the entire model from a relatively large number of aligned 3D meshes of different people in different poses. The invention quantitatively evaluates variants of SMPL using linear or dual- quaternion blend skinning and show that both are more accurate than a Blend SCAPE model trained on the same data. In a further embodiment, the invention realistically models dynamic soft-tissue deformations. Because it is based on blend skinning, SMPL is compatible with existing rendering engines and we make it available for research purposes.

Google Patents [BibTex]

2016

Google Patents [BibTex]


Thumb xl psychscience
Creating body shapes from verbal descriptions by linking similarity spaces

Hill, M. Q., Streuber, S., Hahn, C. A., Black, M. J., O’Toole, A. J.

Psychological Science, 27(11):1486-1497, November 2016, (article)

Abstract
Brief verbal descriptions of bodies (e.g. curvy, long-legged) can elicit vivid mental images. The ease with which we create these mental images belies the complexity of three-dimensional body shapes. We explored the relationship between body shapes and body descriptions and show that a small number of words can be used to generate categorically accurate representations of three-dimensional bodies. The dimensions of body shape variation that emerged in a language-based similarity space were related to major dimensions of variation computed directly from three-dimensional laser scans of 2094 bodies. This allowed us to generate three-dimensional models of people in the shape space using only their coordinates on analogous dimensions in the language-based description space. Human descriptions of photographed bodies and their corresponding models matched closely. The natural mapping between the spaces illustrates the role of language as a concise code for body shape, capturing perceptually salient global and local body features.

pdf [BibTex]

pdf [BibTex]


Thumb xl webteaser
Body Talk: Crowdshaping Realistic 3D Avatars with Words

Streuber, S., Quiros-Ramirez, M. A., Hill, M. Q., Hahn, C. A., Zuffi, S., O’Toole, A., Black, M. J.

ACM Trans. Graph. (Proc. SIGGRAPH), 35(4):54:1-54:14, July 2016 (article)

Abstract
Realistic, metrically accurate, 3D human avatars are useful for games, shopping, virtual reality, and health applications. Such avatars are not in wide use because solutions for creating them from high-end scanners, low-cost range cameras, and tailoring measurements all have limitations. Here we propose a simple solution and show that it is surprisingly accurate. We use crowdsourcing to generate attribute ratings of 3D body shapes corresponding to standard linguistic descriptions of 3D shape. We then learn a linear function relating these ratings to 3D human shape parameters. Given an image of a new body, we again turn to the crowd for ratings of the body shape. The collection of linguistic ratings of a photograph provides remarkably strong constraints on the metric 3D shape. We call the process crowdshaping and show that our Body Talk system produces shapes that are perceptually indistinguishable from bodies created from high-resolution scans and that the metric accuracy is sufficient for many tasks. This makes body “scanning” practical without a scanner, opening up new applications including database search, visualization, and extracting avatars from books.

pdf web tool video talk (ppt) [BibTex]

pdf web tool video talk (ppt) [BibTex]


Thumb xl ijcv tumb
Capturing Hands in Action using Discriminative Salient Points and Physics Simulation

Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.

International Journal of Computer Vision (IJCV), 118(2):172-193, June 2016 (article)

Abstract
Hand motion capture is a popular research field, recently gaining more attention due to the ubiquity of RGB-D sensors. However, even most recent approaches focus on the case of a single isolated hand. In this work, we focus on hands that interact with other hands or objects and present a framework that successfully captures motion in such interaction scenarios for both rigid and articulated objects. Our framework combines a generative model with discriminatively trained salient points to achieve a low tracking error and with collision detection and physics simulation to achieve physically plausible estimates even in case of occlusions and missing visual data. Since all components are unified in a single objective function which is almost everywhere differentiable, it can be optimized with standard optimization techniques. Our approach works for monocular RGB-D sequences as well as setups with multiple synchronized RGB cameras. For a qualitative and quantitative evaluation, we captured 29 sequences with a large variety of interactions and up to 150 degrees of freedom.

Website pdf link (url) DOI Project Page [BibTex]

Website pdf link (url) DOI Project Page [BibTex]


Thumb xl teaser web
Human Pose Estimation from Video and IMUs

Marcard, T. V., Pons-Moll, G., Rosenhahn, B.

Transactions on Pattern Analysis and Machine Intelligence PAMI, 38(8):1533-1547, January 2016 (article)

data pdf dataset_documentation [BibTex]

data pdf dataset_documentation [BibTex]


Thumb xl both testbed cropped
Moving-horizon Nonlinear Least Squares-based Multirobot Cooperative Perception

Ahmad, A., Bülthoff, H.

Robotics and Autonomous Systems, 83, pages: 275-286, 2016 (article)

Abstract
In this article we present an online estimator for multirobot cooperative localization and target tracking based on nonlinear least squares minimization. Our method not only makes the rigorous optimization-based approach applicable online but also allows the estimator to be stable and convergent. We do so by employing a moving horizon technique to nonlinear least squares minimization and a novel design of the arrival cost function that ensures stability and convergence of the estimator. Through an extensive set of real robot experiments, we demonstrate the robustness of our method as well as the optimality of the arrival cost function. The experiments include comparisons of our method with i) an extended Kalman filter-based online-estimator and ii) an offline-estimator based on full-trajectory nonlinear least squares.

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Thumb xl sabteaser
Perceiving Systems (2011-2015)
Scientific Advisory Board Report, 2016 (misc)

pdf [BibTex]

pdf [BibTex]


Thumb xl siyong
Shape estimation of subcutaneous adipose tissue using an articulated statistical shape model

Yeo, S. Y., Romero, J., Loper, M., Machann, J., Black, M.

Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 0(0):1-8, 2016 (article)

publisher website preprint pdf link (url) DOI Project Page [BibTex]

publisher website preprint pdf link (url) DOI Project Page [BibTex]


Thumb xl screen shot 2016 02 22 at 11.46.41
The GRASP Taxonomy of Human Grasp Types

Feix, T., Romero, J., Schmiedmayer, H., Dollar, A., Kragic, D.

Human-Machine Systems, IEEE Transactions on, 46(1):66-77, 2016 (article)

publisher website pdf DOI Project Page [BibTex]

publisher website pdf DOI Project Page [BibTex]


Thumb xl pami
Map-Based Probabilistic Visual Self-Localization

Brubaker, M. A., Geiger, A., Urtasun, R.

IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2016 (article)

Abstract
Accurate and efficient self-localization is a critical problem for autonomous systems. This paper describes an affordable solution to vehicle self-localization which uses odometry computed from two video cameras and road maps as the sole inputs. The core of the method is a probabilistic model for which an efficient approximate inference algorithm is derived. The inference algorithm is able to utilize distributed computation in order to meet the real-time requirements of autonomous systems in some instances. Because of the probabilistic nature of the model the method is capable of coping with various sources of uncertainty including noise in the visual odometry and inherent ambiguities in the map (e.g., in a Manhattan world). By exploiting freely available, community developed maps and visual odometry measurements, the proposed method is able to localize a vehicle to 4m on average after 52 seconds of driving on maps which contain more than 2,150km of drivable roads.

pdf Project Page [BibTex]

pdf Project Page [BibTex]

1999


Thumb xl bildschirmfoto 2012 12 06 um 09.38.15
Parameterized modeling and recognition of activities

Yacoob, Y., Black, M. J.

Computer Vision and Image Understanding, 73(2):232-247, 1999 (article)

Abstract
In this paper we consider a class of human activities—atomic activities—which can be represented as a set of measurements over a finite temporal window (e.g., the motion of human body parts during a walking cycle) and which has a relatively small space of variations in performance. A new approach for modeling and recognition of atomic activities that employs principal component analysis and analytical global transformations is proposed. The modeling of sets of exemplar instances of activities that are similar in duration and involve similar body part motions is achieved by parameterizing their representation using principal component analysis. The recognition of variants of modeled activities is achieved by searching the space of admissible parameterized transformations that these activities can undergo. This formulation iteratively refines the recognition of the class to which the observed activity belongs and the transformation parameters that relate it to the model in its class. We provide several experiments on recognition of articulated and deformable human motions from image motion parameters.

pdf pdf from publisher DOI [BibTex]

1999

pdf pdf from publisher DOI [BibTex]

1997


Thumb xl yasersmile
Recognizing facial expressions in image sequences using local parameterized models of image motion

Black, M. J., Yacoob, Y.

Int. Journal of Computer Vision, 25(1):23-48, 1997 (article)

Abstract
This paper explores the use of local parametrized models of image motion for recovering and recognizing the non-rigid and articulated motion of human faces. Parametric flow models (for example affine) are popular for estimating motion in rigid scenes. We observe that within local regions in space and time, such models not only accurately model non-rigid facial motions but also provide a concise description of the motion in terms of a small number of parameters. These parameters are intuitively related to the motion of facial features during facial expressions and we show how expressions such as anger, happiness, surprise, fear, disgust, and sadness can be recognized from the local parametric motions in the presence of significant head motion. The motion tracking and expression recognition approach performed with high accuracy in extensive laboratory experiments involving 40 subjects as well as in television and movie sequences.

pdf pdf from publisher abstract video [BibTex]