Header logo is ps


2017


Thumb xl flamewebteaserwide
Learning a model of facial shape and expression from 4D scans

Li, T., Bolkart, T., Black, M. J., Li, H., Romero, J.

ACM Transactions on Graphics, 36(6):194:1-194:17, November 2017, Two first authors contributed equally (article)

Abstract
The field of 3D face modeling has a large gap between high-end and low-end methods. At the high end, the best facial animation is indistinguishable from real humans, but this comes at the cost of extensive manual labor. At the low end, face capture from consumer depth sensors relies on 3D face models that are not expressive enough to capture the variability in natural facial shape and expression. We seek a middle ground by learning a facial model from thousands of accurately aligned 3D scans. Our FLAME model (Faces Learned with an Articulated Model and Expressions) is designed to work with existing graphics software and be easy to fit to data. FLAME uses a linear shape space trained from 3800 scans of human heads. FLAME combines this linear shape space with an articulated jaw, neck, and eyeballs, pose-dependent corrective blendshapes, and additional global expression from 4D face sequences in the D3DFACS dataset along with additional 4D sequences.We accurately register a template mesh to the scan sequences and make the D3DFACS registrations available for research purposes. In total the model is trained from over 33, 000 scans. FLAME is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model. We compare FLAME to these models by fitting them to static 3D scans and 4D sequences using the same optimization method. FLAME is significantly more accurate and is available for research purposes (http://flame.is.tue.mpg.de).

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]

2017

data/model video code chumpy code tensorflow paper supplemental Project Page [BibTex]


Thumb xl molbert
Investigating Body Image Disturbance in Anorexia Nervosa Using Novel Biometric Figure Rating Scales: A Pilot Study

Mölbert, S. C., Thaler, A., Streuber, S., Black, M. J., Karnath, H., Zipfel, S., Mohler, B., Giel, K. E.

European Eating Disorders Review, 25(6):607-612, November 2017 (article)

Abstract
This study uses novel biometric figure rating scales (FRS) spanning body mass index (BMI) 13.8 to 32.2 kg/m2 and BMI 18 to 42 kg/m2. The aims of the study were (i) to compare FRS body weight dissatisfaction and perceptual distortion of women with anorexia nervosa (AN) to a community sample; (ii) how FRS parameters are associated with questionnaire body dissatisfaction, eating disorder symptoms and appearance comparison habits; and (iii) whether the weight spectrum of the FRS matters. Women with AN (n = 24) and a community sample of women (n = 104) selected their current and ideal body on the FRS and completed additional questionnaires. Women with AN accurately picked the body that aligned best with their actual weight in both FRS. Controls underestimated their BMI in the FRS 14–32 and were accurate in the FRS 18–42. In both FRS, women with AN desired a body close to their actual BMI and controls desired a thinner body. Our observations suggest that body image disturbance in AN is unlikely to be characterized by a visual perceptual disturbance, but rather by an idealization of underweight in conjunction with high body dissatisfaction. The weight spectrum of FRS can influence the accuracy of BMI estimation.

publisher DOI Project Page [BibTex]


Thumb xl manoteaser
Embodied Hands: Modeling and Capturing Hands and Bodies Together

Romero, J., Tzionas, D., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6):245:1-245:17, 245:1–245:17, ACM, November 2017 (article)

Abstract
Humans move their hands and bodies together to communicate and solve tasks. Capturing and replicating such coordinated activity is critical for virtual characters that behave realistically. Surprisingly, most methods treat the 3D modeling and tracking of bodies and hands separately. Here we formulate a model of hands and bodies interacting together and fit it to full-body 4D sequences. When scanning or capturing the full body in 3D, hands are small and often partially occluded, making their shape and pose hard to recover. To cope with low-resolution, occlusion, and noise, we develop a new model called MANO (hand Model with Articulated and Non-rigid defOrmations). MANO is learned from around 1000 high-resolution 3D scans of hands of 31 subjects in a wide variety of hand poses. The model is realistic, low-dimensional, captures non-rigid shape changes with pose, is compatible with standard graphics packages, and can fit any human hand. MANO provides a compact mapping from hand poses to pose blend shape corrections and a linear manifold of pose synergies. We attach MANO to a standard parameterized 3D body shape model (SMPL), resulting in a fully articulated body and hand model (SMPL+H). We illustrate SMPL+H by fitting complex, natural, activities of subjects captured with a 4D scanner. The fitting is fully automatic and results in full body models that move naturally with detailed hand motions and a realism not seen before in full body performance capture. The models and data are freely available for research purposes at http://mano.is.tue.mpg.de.

website youtube paper suppl video link (url) DOI Project Page [BibTex]

website youtube paper suppl video link (url) DOI Project Page [BibTex]


Thumb xl cover tro paper
An Online Scalable Approach to Unified Multirobot Cooperative Localization and Object Tracking

Ahmad, A., Lawless, G., Lima, P.

IEEE Transactions on Robotics (T-RO), 33, pages: 1184 - 1199, October 2017 (article)

Abstract
In this article we present a unified approach for multi-robot cooperative simultaneous localization and object tracking based on particle filters. Our approach is scalable with respect to the number of robots in the team. We introduce a method that reduces, from an exponential to a linear growth, the space and computation time requirements with respect to the number of robots in order to maintain a given level of accuracy in the full state estimation. Our method requires no increase in the number of particles with respect to the number of robots. However, in our method each particle represents a full state hypothesis, leading to the linear dependency on the number of robots of both space and time complexity. The derivation of the algorithm implementing our approach from a standard particle filter algorithm and its complexity analysis are presented. Through an extensive set of simulation experiments on a large number of randomized datasets, we demonstrate the correctness and efficacy of our approach. Through real robot experiments on a standardized open dataset of a team of four soccer playing robots tracking a ball, we evaluate our method's estimation accuracy with respect to the ground truth values. Through comparisons with other methods based on i) nonlinear least squares minimization and ii) joint extended Kalman filter, we further highlight our method's advantages. Finally, we also present a robustness test for our approach by evaluating it under scenarios of communication and vision failure in teammate robots.

Published Version link (url) DOI [BibTex]


Thumb xl image  1
Human Shape Estimation using Statistical Body Models

Loper, M. M.

University of Tübingen, May 2017 (thesis)

Abstract
Human body estimation methods transform real-world observations into predictions about human body state. These estimation methods benefit a variety of health, entertainment, clothing, and ergonomics applications. State may include pose, overall body shape, and appearance. Body state estimation is underconstrained by observations; ambiguity presents itself both in the form of missing data within observations, and also in the form of unknown correspondences between observations. We address this challenge with the use of a statistical body model: a data-driven virtual human. This helps resolve ambiguity in two ways. First, it fills in missing data, meaning that incomplete observations still result in complete shape estimates. Second, the model provides a statistically-motivated penalty for unlikely states, which enables more plausible body shape estimates. Body state inference requires more than a body model; we therefore build obser- vation models whose output is compared with real observations. In this thesis, body state is estimated from three types of observations: 3D motion capture markers, depth and color images, and high-resolution 3D scans. In each case, a forward process is proposed which simulates observations. By comparing observations to the results of the forward process, state can be adjusted to minimize the difference between simulated and observed data. We use gradient-based methods because they are critical to the precise estimation of state with a large number of parameters. The contributions of this work include three parts. First, we propose a method for the estimation of body shape, nonrigid deformation, and pose from 3D markers. Second, we present a concise approach to differentiating through the rendering process, with application to body shape estimation. And finally, we present a statistical body model trained from human body scans, with state-of-the-art fidelity, good runtime performance, and compatibility with existing animation packages.

Official Version [BibTex]


Thumb xl early stopping teaser
Early Stopping Without a Validation Set

Mahsereci, M., Balles, L., Lassner, C., Hennig, P.

arXiv preprint arXiv:1703.09580, 2017 (article)

Abstract
Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.

link (url) Project Page Project Page [BibTex]


Thumb xl web image
Data-Driven Physics for Human Soft Tissue Animation

Kim, M., Pons-Moll, G., Pujades, S., Bang, S., Kim, J., Black, M. J., Lee, S.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4):54:1-54:12, 2017 (article)

Abstract
Data driven models of human poses and soft-tissue deformations can produce very realistic results, but they only model the visible surface of the human body and cannot create skin deformation due to interactions with the environment. Physical simulations can generalize to external forces, but their parameters are difficult to control. In this paper, we present a layered volumetric human body model learned from data. Our model is composed of a data-driven inner layer and a physics-based external layer. The inner layer is driven with a volumetric statistical body model (VSMPL). The soft tissue layer consists of a tetrahedral mesh that is driven using the finite element method (FEM). Model parameters, namely the segmentation of the body into layers and the soft tissue elasticity, are learned directly from 4D registrations of humans exhibiting soft tissue deformations. The learned two layer model is a realistic full-body avatar that generalizes to novel motions and external forces. Experiments show that the resulting avatars produce realistic results on held out sequences and react to external forces. Moreover, the model supports the retargeting of physical properties from one avatar when they share the same topology.

video paper link (url) Project Page [BibTex]

video paper link (url) Project Page [BibTex]


Thumb xl phd thesis teaser
Learning Inference Models for Computer Vision

Jampani, V.

MPI for Intelligent Systems and University of Tübingen, 2017 (phdthesis)

Abstract
Computer vision can be understood as the ability to perform 'inference' on image data. Breakthroughs in computer vision technology are often marked by advances in inference techniques, as even the model design is often dictated by the complexity of inference in them. This thesis proposes learning based inference schemes and demonstrates applications in computer vision. We propose techniques for inference in both generative and discriminative computer vision models. Despite their intuitive appeal, the use of generative models in vision is hampered by the difficulty of posterior inference, which is often too complex or too slow to be practical. We propose techniques for improving inference in two widely used techniques: Markov Chain Monte Carlo (MCMC) sampling and message-passing inference. Our inference strategy is to learn separate discriminative models that assist Bayesian inference in a generative model. Experiments on a range of generative vision models show that the proposed techniques accelerate the inference process and/or converge to better solutions. A main complication in the design of discriminative models is the inclusion of prior knowledge in a principled way. For better inference in discriminative models, we propose techniques that modify the original model itself, as inference is simple evaluation of the model. We concentrate on convolutional neural network (CNN) models and propose a generalization of standard spatial convolutions, which are the basic building blocks of CNN architectures, to bilateral convolutions. First, we generalize the existing use of bilateral filters and then propose new neural network architectures with learnable bilateral filters, which we call `Bilateral Neural Networks'. We show how the bilateral filtering modules can be used for modifying existing CNN architectures for better image segmentation and propose a neural network approach for temporal information propagation in videos. Experiments demonstrate the potential of the proposed bilateral networks on a wide range of vision tasks and datasets. In summary, we propose learning based techniques for better inference in several computer vision models ranging from inverse graphics to freely parameterized neural networks. In generative vision models, our inference techniques alleviate some of the crucial hurdles in Bayesian posterior inference, paving new ways for the use of model based machine learning in vision. In discriminative CNN models, the proposed filter generalizations aid in the design of new neural network architectures that can handle sparse high-dimensional data as well as provide a way for incorporating prior knowledge into CNNs.

pdf [BibTex]

pdf [BibTex]


Thumb xl web teaser eg
Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

(Best Paper, Eurographics 2017)

Marcard, T. V., Rosenhahn, B., Black, M., Pons-Moll, G.

Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), pages: 349-360 , 2017 (article)

Abstract
We address the problem of making human motion capture in the wild more practical by using a small set of inertial sensors attached to the body. Since the problem is heavily under-constrained, previous methods either use a large number of sensors, which is intrusive, or they require additional video input. We take a different approach and constrain the problem by: (i) making use of a realistic statistical body model that includes anthropometric constraints and (ii) using a joint optimization framework to fit the model to orientation and acceleration measurements over multiple frames. The resulting tracker Sparse Inertial Poser (SIP) enables motion capture using only 6 sensors (attached to the wrists, lower legs, back and head) and works for arbitrary human motions. Experiments on the recently released TNT15 dataset show that, using the same number of sensors, SIP achieves higher accuracy than the dataset baseline without using any video data. We further demonstrate the effectiveness of SIP on newly recorded challenging motions in outdoor scenarios such as climbing or jumping over a wall

video pdf Project Page [BibTex]

video pdf Project Page [BibTex]


Thumb xl pami 2017 teaser
Efficient 2D and 3D Facade Segmentation using Auto-Context

Gadde, R., Jampani, V., Marlet, R., Gehler, P.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017 (article)

Abstract
This paper introduces a fast and efficient segmentation technique for 2D images and 3D point clouds of building facades. Facades of buildings are highly structured and consequently most methods that have been proposed for this problem aim to make use of this strong prior information. Contrary to most prior work, we are describing a system that is almost domain independent and consists of standard segmentation methods. We train a sequence of boosted decision trees using auto-context features. This is learned using stacked generalization. We find that this technique performs better, or comparable with all previous published methods and present empirical results on all available 2D and 3D facade benchmark datasets. The proposed method is simple to implement, easy to extend, and very efficient at test-time inference.

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


Thumb xl web image
ClothCap: Seamless 4D Clothing Capture and Retargeting

Pons-Moll, G., Pujades, S., Hu, S., Black, M.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4):73:1-73:15, ACM, New York, NY, USA, 2017, Two first authors contributed equally (article)

Abstract
Designing and simulating realistic clothing is challenging and, while several methods have addressed the capture of clothing from 3D scans, previous methods have been limited to single garments and simple motions, lack detail, or require specialized texture patterns. Here we address the problem of capturing regular clothing on fully dressed people in motion. People typically wear multiple pieces of clothing at a time. To estimate the shape of such clothing, track it over time, and render it believably, each garment must be segmented from the others and the body. Our ClothCap approach uses a new multi-part 3D model of clothed bodies, automatically segments each piece of clothing, estimates the naked body shape and pose under the clothing, and tracks the 3D deformations of the clothing over time. We estimate the garments and their motion from 4D scans; that is, high-resolution 3D scans of the subject in motion at 60 fps. The model allows us to capture a clothed person in motion, extract their clothing, and retarget the clothing to new body shapes. ClothCap provides a step towards virtual try-on with a technology for capturing, modeling, and analyzing clothing in motion.

video project_page paper link (url) DOI Project Page Project Page [BibTex]

video project_page paper link (url) DOI Project Page Project Page [BibTex]


Thumb xl coverhand wilson
Capturing Hand-Object Interaction and Reconstruction of Manipulated Objects

Tzionas, D.

University of Bonn, 2017 (phdthesis)

Abstract
Hand motion capture with an RGB-D sensor gained recently a lot of research attention, however, even most recent approaches focus on the case of a single isolated hand. We focus instead on hands that interact with other hands or with a rigid or articulated object. Our framework successfully captures motion in such scenarios by combining a generative model with discriminatively trained salient points, collision detection and physics simulation to achieve a low tracking error with physically plausible poses. All components are unified in a single objective function that can be optimized with standard optimization techniques. We initially assume a-priori knowledge of the object's shape and skeleton. In case of unknown object shape there are existing 3d reconstruction methods that capitalize on distinctive geometric or texture features. These methods though fail for textureless and highly symmetric objects like household articles, mechanical parts or toys. We show that extracting 3d hand motion for in-hand scanning effectively facilitates the reconstruction of such objects and we fuse the rich additional information of hands into a 3d reconstruction pipeline. Finally, although shape reconstruction is enough for rigid objects, there is a lack of tools that build rigged models of articulated objects that deform realistically using RGB-D data. We propose a method that creates a fully rigged model consisting of a watertight mesh, embedded skeleton and skinning weights by employing a combination of deformable mesh tracking, motion segmentation based on spectral clustering and skeletonization based on mean curvature flow.

Thesis link (url) Project Page [BibTex]

2006


Thumb xl neuralcomp
Bayesian population decoding of motor cortical activity using a Kalman filter

Wu, W., Gao, Y., Bienenstock, E., Donoghue, J. P., Black, M. J.

Neural Computation, 18(1):80-118, 2006 (article)

Abstract
Effective neural motor prostheses require a method for decoding neural activity representing desired movement. In particular, the accurate reconstruction of a continuous motion signal is necessary for the control of devices such as computer cursors, robots, or a patient's own paralyzed limbs. For such applications, we developed a real-time system that uses Bayesian inference techniques to estimate hand motion from the firing rates of multiple neurons. In this study, we used recordings that were previously made in the arm area of primary motor cortex in awake behaving monkeys using a chronically implanted multielectrode microarray. Bayesian inference involves computing the posterior probability of the hand motion conditioned on a sequence of observed firing rates; this is formulated in terms of the product of a likelihood and a prior. The likelihood term models the probability of firing rates given a particular hand motion. We found that a linear gaussian model could be used to approximate this likelihood and could be readily learned from a small amount of training data. The prior term defines a probabilistic model of hand kinematics and was also taken to be a linear gaussian model. Decoding was performed using a Kalman filter, which gives an efficient recursive method for Bayesian inference when the likelihood and prior are linear and gaussian. In off-line experiments, the Kalman filter reconstructions of hand trajectory were more accurate than previously reported results. The resulting decoding algorithm provides a principled probabilistic model of motor-cortical coding, decodes hand motion in real time, provides an estimate of uncertainty, and is straightforward to implement. Additionally the formulation unifies and extends previous models of neural coding while providing insights into the motor-cortical code.

pdf preprint pdf from publisher abstract [BibTex]

2006

pdf preprint pdf from publisher abstract [BibTex]

1992


Thumb xl thesis
Robust Incremental Optical Flow

Black, M. J.

Yale University, Department of Computer Science, New Haven, CT, 1992, Research Report YALEU-DCS-RR-923 (phdthesis)

pdf code [BibTex]

1992

pdf code [BibTex]