111 results (BibTeX)

2017


Thumb xl manoteaser
Embodied Hands: Modeling and Capturing Hands and Bodies Together

Romero, J., Tzionas, D., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), 245:1–245:17, November 2017, (*) Two first authors contributed equally (article)

Abstract
Humans move their hands and bodies together to communicate and solve tasks. Capturing and replicating such coordinated activity is critical for virtual characters that behave realistically. Surprisingly, most methods treat the 3D modeling and tracking of bodies and hands separately. Here we formulate a model of hands and bodies interacting together and fit it to full-body 4D sequences. When scanning or capturing the full body in 3D, hands are small and often partially occluded, making their shape and pose hard to recover. To cope with low-resolution, occlusion, and noise, we develop a new model called MANO (hand Model with Articulated and Non-rigid defOrmations). MANO is learned from around 1000 high-resolution 3D scans of hands of 31 subjects in a wide variety of hand poses. The model is realistic, low-dimensional, captures non-rigid shape changes with pose, is compatible with standard graphics packages, and can fit any human hand. MANO provides a compact mapping from hand poses to pose blend shape corrections and a linear manifold of pose synergies. We attach MANO to a standard parameterized 3D body shape model (SMPL), resulting in a fully articulated body and hand model (SMPL+H). We illustrate SMPL+H by fitting complex, natural, activities of subjects captured with a 4D scanner. The fitting is fully automatic and results in full body models that move naturally with detailed hand motions and a realism not seen before in full body performance capture. The models and data are freely available for research purposes at http://mano.is.tue.mpg.de.

website youtube paper suppl video DOI Project Page [BibTex]

2017

website youtube paper suppl video DOI Project Page [BibTex]


Thumb xl flamewebteaserwide
Learning a model of facial shape and expression from 4D scans

Li, T., Bolkart, T., Black, M. J., Li, H., Romero, J.

ACM Transactions on Graphics, 36(6):194:1-194:17, November 2017, Two first authors contributed equally (article)

Abstract
The field of 3D face modeling has a large gap between high-end and low-end methods. At the high end, the best facial animation is indistinguishable from real humans, but this comes at the cost of extensive manual labor. At the low end, face capture from consumer depth sensors relies on 3D face models that are not expressive enough to capture the variability in natural facial shape and expression. We seek a middle ground by learning a facial model from thousands of accurately aligned 3D scans. Our FLAME model (Faces Learned with an Articulated Model and Expressions) is designed to work with existing graphics software and be easy to fit to data. FLAME uses a linear shape space trained from 3800 scans of human heads. FLAME combines this linear shape space with an articulated jaw, neck, and eyeballs, pose-dependent corrective blendshapes, and additional global expression from 4D face sequences in the D3DFACS dataset along with additional 4D sequences.We accurately register a template mesh to the scan sequences and make the D3DFACS registrations available for research purposes. In total the model is trained from over 33, 000 scans. FLAME is low-dimensional but more expressive than the FaceWarehouse model and the Basel Face Model. We compare FLAME to these models by fitting them to static 3D scans and 4D sequences using the same optimization method. FLAME is significantly more accurate and is available for research purposes (http://flame.is.tue.mpg.de).

data/model video paper supplemental [BibTex]

data/model video paper supplemental [BibTex]


Thumb xl animage2mask3
Assessing body image in anorexia nervosa using biometric self-avatars in virtual reality: Attitudinal components rather than visual body size estimation are distorted

Mölbert, S. C., Thaler, A., Mohler, B. J., Streuber, S., Romero, J., Black, M. J., Zipfel, S., Karnath, H., Giel, K. E.

Psychological Medicine, 2017 (article)

Abstract
Background: Body image disturbance (BID) is a core symptom of anorexia nervosa (AN), but as yet distinctive features of BID are unknown. The present study aimed at disentangling perceptual and attitudinal components of BID in AN. Methods: We investigated n=24 women with AN and n=24 controls. Based on a 3D body scan, we created realistic virtual 3D bodies (avatars) for each participant that were varied through a range of ±20% of the participants' weights. Avatars were presented in a virtual reality mirror scenario. Using different psychophysical tasks, participants identified and adjusted their actual and their desired body weight. To test for general perceptual biases in estimating body weight, a second experiment investigated perception of weight and shape matched avatars with another identity. Results: Women with AN and controls underestimated their weight, with a trend that women with AN underestimated more. The average desired body of controls had normal weight while the average desired weight of women with AN corresponded to extreme AN (DSM-5). Correlation analyses revealed that desired body weight, but not accuracy of weight estimation, was associated with eating disorder symptoms. In the second experiment, both groups estimated accurately while the most attractive body was similar to Experiment 1. Conclusions: Our results contradict the widespread assumption that patients with AN overestimate their body weight due to visual distortions. Rather, they illustrate that BID might be driven by distorted attitudes with regard to the desired body. Clinical interventions should aim at helping patients with AN to change their desired weight.

doi pdf [BibTex]


Thumb xl cover tro paper
An Online Scalable Approach to Unified Multirobot Cooperative Localization and Object Tracking

Ahmad, A., Lawless, G., Lima, P.

IEEE Transactions on Robotics (T-RO), 2017, Accepted, May 2017 (article)

Abstract
In this article we present a unified approach for multi-robot cooperative simultaneous localization and object tracking based on particle filters. Our approach is scalable with respect to the number of robots in the team. We introduce a method that reduces, from an exponential to a linear growth, the space and computation time requirements with respect to the number of robots in order to maintain a given level of accuracy in the full state estimation. Our method requires no increase in the number of particles with respect to the number of robots. However, in our method each particle represents a full state hypothesis, leading to the linear dependency on the number of robots of both space and time complexity. The derivation of the algorithm implementing our approach from a standard particle filter algorithm and its complexity analysis are presented. Through an extensive set of simulation experiments on a large number of randomized datasets, we demonstrate the correctness and efficacy of our approach. Through real robot experiments on a standardized open dataset of a team of four soccer playing robots tracking a ball, we evaluate our method's estimation accuracy with respect to the ground truth values. Through comparisons with other methods based on i) nonlinear least squares minimization and ii) joint extended Kalman filter, we further highlight our method's advantages. Finally, we also present a robustness test for our approach by evaluating it under scenarios of communication and vision failure in teammate robots.

accepted pre-print version [BibTex]


Thumb xl early stopping teaser
Early Stopping Without a Validation Set

Mahsereci, M., Balles, L., Lassner, C., Hennig, P.

arXiv preprint arXiv:1703.09580, 2017 (article)

Abstract
Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.

link (url) Project Page [BibTex]


Thumb xl web image
ClothCap: Seamless 4D Clothing Capture and Retargeting

Pons-Moll, G., Pujades, S., Hu, S., Black, M.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4), 2017, Two first authors contributed equally (article)

Abstract
Designing and simulating realistic clothing is challenging and, while several methods have addressed the capture of clothing from 3D scans, previous methods have been limited to single garments and simple motions, lack detail, or require specialized texture patterns. Here we address the problem of capturing regular clothing on fully dressed people in motion. People typically wear multiple pieces of clothing at a time. To estimate the shape of such clothing, track it over time, and render it believably, each garment must be segmented from the others and the body. Our ClothCap approach uses a new multi-part 3D model of clothed bodies, automatically segments each piece of clothing, estimates the naked body shape and pose under the clothing, and tracks the 3D deformations of the clothing over time. We estimate the garments and their motion from 4D scans; that is, high-resolution 3D scans of the subject in motion at 60 fps. The model allows us to capture a clothed person in motion, extract their clothing, and retarget the clothing to new body shapes. ClothCap provides a step towards virtual try-on with a technology for capturing, modeling, and analyzing clothing in motion.

video project_page paper link (url) Project Page [BibTex]

video project_page paper link (url) Project Page [BibTex]


Thumb xl web image
Data-Driven Physics for Human Soft Tissue Animation

Kim, M., Pons-Moll, G., Pujades, S., Bang, S., Kim, J., Black, M., Lee, S.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 36(4), 2017 (article)

Abstract
Data driven models of human poses and soft-tissue deformations can produce very realistic results, but they only model the visible surface of the human body and cannot create skin deformation due to interactions with the environment. Physical simulations can generalize to external forces, but their parameters are difficult to control. In this paper, we present a layered volumetric human body model learned from data. Our model is composed of a data-driven inner layer and a physics-based external layer. The inner layer is driven with a volumetric statistical body model (VSMPL). The soft tissue layer consists of a tetrahedral mesh that is driven using the finite element method (FEM). Model parameters, namely the segmentation of the body into layers and the soft tissue elasticity, are learned directly from 4D registrations of humans exhibiting soft tissue deformations. The learned two layer model is a realistic full-body avatar that generalizes to novel motions and external forces. Experiments show that the resulting avatars produce realistic results on held out sequences and react to external forces. Moreover, the model supports the retargeting of physical properties from one avatar when they share the same topology.

video paper link (url) [BibTex]

video paper link (url) [BibTex]


Thumb xl pami 2017 teaser
Efficient 2D and 3D Facade Segmentation using Auto-Context

Gadde, R., Jampani, V., Marlet, R., Gehler, P.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017 (article)

Abstract
This paper introduces a fast and efficient segmentation technique for 2D images and 3D point clouds of building facades. Facades of buildings are highly structured and consequently most methods that have been proposed for this problem aim to make use of this strong prior information. Contrary to most prior work, we are describing a system that is almost domain independent and consists of standard segmentation methods. We train a sequence of boosted decision trees using auto-context features. This is learned using stacked generalization. We find that this technique performs better, or comparable with all previous published methods and present empirical results on all available 2D and 3D facade benchmark datasets. The proposed method is simple to implement, easy to extend, and very efficient at test-time inference.

arXiv Project Page [BibTex]

arXiv Project Page [BibTex]


Thumb xl web teaser eg
Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

(Best Paper, Eurographics 2017)

Marcard, T. V., Rosenhahn, B., Black, M., Pons-Moll, G.

Computer Graphics Forum 36(2), Proceedings of the 38th Annual Conference of the European Association for Computer Graphics (Eurographics), 2017 (article)

Abstract
We address the problem of making human motion capture in the wild more practical by using a small set of inertial sensors attached to the body. Since the problem is heavily under-constrained, previous methods either use a large number of sensors, which is intrusive, or they require additional video input. We take a different approach and constrain the problem by: (i) making use of a realistic statistical body model that includes anthropometric constraints and (ii) using a joint optimization framework to fit the model to orientation and acceleration measurements over multiple frames. The resulting tracker Sparse Inertial Poser (SIP) enables motion capture using only 6 sensors (attached to the wrists, lower legs, back and head) and works for arbitrary human motions. Experiments on the recently released TNT15 dataset show that, using the same number of sensors, SIP achieves higher accuracy than the dataset baseline without using any video data. We further demonstrate the effectiveness of SIP on newly recorded challenging motions in outdoor scenarios such as climbing or jumping over a wall

video pdf [BibTex]

video pdf [BibTex]

2016


Thumb xl both testbed cropped
Moving-horizon Nonlinear Least Squares-based Multirobot Cooperative Perception

Ahmad, A., Bülthoff, H.

Robotics and Autonomous Systems, 83, pages: 275-286, 2016 (article)

Abstract
In this article we present an online estimator for multirobot cooperative localization and target tracking based on nonlinear least squares minimization. Our method not only makes the rigorous optimization-based approach applicable online but also allows the estimator to be stable and convergent. We do so by employing a moving horizon technique to nonlinear least squares minimization and a novel design of the arrival cost function that ensures stability and convergence of the estimator. Through an extensive set of real robot experiments, we demonstrate the robustness of our method as well as the optimality of the arrival cost function. The experiments include comparisons of our method with i) an extended Kalman filter-based online-estimator and ii) an offline-estimator based on full-trajectory nonlinear least squares.

DOI [BibTex]

2016

DOI [BibTex]


Thumb xl psychscience
Creating body shapes from verbal descriptions by linking similarity spaces

Hill, M. Q., Streuber, S., Hahn, C. A., Black, M. J., O’Toole, A. J.

Psychological Science, 27(11):1486-1497, November 2016, (article)

Abstract
Brief verbal descriptions of bodies (e.g. curvy, long-legged) can elicit vivid mental images. The ease with which we create these mental images belies the complexity of three-dimensional body shapes. We explored the relationship between body shapes and body descriptions and show that a small number of words can be used to generate categorically accurate representations of three-dimensional bodies. The dimensions of body shape variation that emerged in a language-based similarity space were related to major dimensions of variation computed directly from three-dimensional laser scans of 2094 bodies. This allowed us to generate three-dimensional models of people in the shape space using only their coordinates on analogous dimensions in the language-based description space. Human descriptions of photographed bodies and their corresponding models matched closely. The natural mapping between the spaces illustrates the role of language as a concise code for body shape, capturing perceptually salient global and local body features.

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl siyong
Shape estimation of subcutaneous adipose tissue using an articulated statistical shape model

Yeo, S. Y., Romero, J., Loper, M., Machann, J., Black, M.

Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 0(0):1-8, 2016 (article)

publisher website preprint pdf link (url) DOI [BibTex]

publisher website preprint pdf link (url) DOI [BibTex]


Thumb xl webteaser
Body Talk: Crowdshaping Realistic 3D Avatars with Words

Streuber, S., Quiros-Ramirez, M. A., Hill, M. Q., Hahn, C. A., Zuffi, S., O’Toole, A., Black, M. J.

ACM Trans. Graph. (Proc. SIGGRAPH), 35(4):54:1-54:14, July 2016 (article)

Abstract
Realistic, metrically accurate, 3D human avatars are useful for games, shopping, virtual reality, and health applications. Such avatars are not in wide use because solutions for creating them from high-end scanners, low-cost range cameras, and tailoring measurements all have limitations. Here we propose a simple solution and show that it is surprisingly accurate. We use crowdsourcing to generate attribute ratings of 3D body shapes corresponding to standard linguistic descriptions of 3D shape. We then learn a linear function relating these ratings to 3D human shape parameters. Given an image of a new body, we again turn to the crowd for ratings of the body shape. The collection of linguistic ratings of a photograph provides remarkably strong constraints on the metric 3D shape. We call the process crowdshaping and show that our Body Talk system produces shapes that are perceptually indistinguishable from bodies created from high-resolution scans and that the metric accuracy is sufficient for many tasks. This makes body “scanning” practical without a scanner, opening up new applications including database search, visualization, and extracting avatars from books.

pdf web tool video talk (ppt) Project Page [BibTex]

pdf web tool video talk (ppt) Project Page [BibTex]


Thumb xl screen shot 2016 02 22 at 11.46.41
The GRASP Taxonomy of Human Grasp Types

Feix, T., Romero, J., Schmiedmayer, H., Dollar, A., Kragic, D.

Human-Machine Systems, IEEE Transactions on, 46(1):66-77, 2016 (article)

publisher website pdf DOI [BibTex]

publisher website pdf DOI [BibTex]


Thumb xl ijcv tumb
Capturing Hands in Action using Discriminative Salient Points and Physics Simulation

Tzionas, D., Ballan, L., Srikantha, A., Aponte, P., Pollefeys, M., Gall, J.

International Journal of Computer Vision (IJCV), 2016 (article)

Abstract
Hand motion capture is a popular research field, recently gaining more attention due to the ubiquity of RGB-D sensors. However, even most recent approaches focus on the case of a single isolated hand. In this work, we focus on hands that interact with other hands or objects and present a framework that successfully captures motion in such interaction scenarios for both rigid and articulated objects. Our framework combines a generative model with discriminatively trained salient points to achieve a low tracking error and with collision detection and physics simulation to achieve physically plausible estimates even in case of occlusions and missing visual data. Since all components are unified in a single objective function which is almost everywhere differentiable, it can be optimized with standard optimization techniques. Our approach works for monocular RGB-D sequences as well as setups with multiple synchronized RGB cameras. For a qualitative and quantitative evaluation, we captured 29 sequences with a large variety of interactions and up to 150 degrees of freedom.

Website pdf DOI Project Page [BibTex]

Website pdf DOI Project Page [BibTex]


Thumb xl teaser web
Human Pose Estimation from Video and IMUs

Marcard, T. V., Pons-Moll, G., Rosenhahn, B.

Transactions on Pattern Analysis and Machine Intelligence PAMI, January 2016 (article)

data pdf dataset_documentation [BibTex]

data pdf dataset_documentation [BibTex]


Thumb xl pami
Map-Based Probabilistic Visual Self-Localization

Brubaker, M. A., Geiger, A., Urtasun, R.

IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2016 (article)

Abstract
Accurate and efficient self-localization is a critical problem for autonomous systems. This paper describes an affordable solution to vehicle self-localization which uses odometry computed from two video cameras and road maps as the sole inputs. The core of the method is a probabilistic model for which an efficient approximate inference algorithm is derived. The inference algorithm is able to utilize distributed computation in order to meet the real-time requirements of autonomous systems in some instances. Because of the probabilistic nature of the model the method is capable of coping with various sources of uncertainty including noise in the visual odometry and inherent ambiguities in the map (e.g., in a Manhattan world). By exploiting freely available, community developed maps and visual odometry measurements, the proposed method is able to localize a vehicle to 4m on average after 52 seconds of driving on maps which contain more than 2,150km of drivable roads.

pdf [BibTex]

pdf [BibTex]

2015


Thumb xl fotorobos
Formation control driven by cooperative object tracking

Lima, P., Ahmad, A., Dias, A., Conceição, A., Moreira, A., Silva, E., Almeida, L., Oliveira, L., Nascimento, T.

Robotics and Autonomous Systems, 63(1):68-79, 2015 (article)

Abstract
In this paper we introduce a formation control loop that maximizes the performance of the cooperative perception of a tracked target by a team of mobile robots, while maintaining the team in formation, with a dynamically adjustable geometry which is a function of the quality of the target perception by the team. In the formation control loop, the controller module is a distributed non-linear model predictive controller and the estimator module fuses local estimates of the target state, obtained by a particle filter at each robot. The two modules and their integration are described in detail, including a real-time database associated to a wireless communication protocol that facilitates the exchange of state data while reducing collisions among team members. Simulation and real robot results for indoor and outdoor teams of different robots are presented. The results highlight how our method successfully enables a team of homogeneous robots to minimize the total uncertainty of the tracked target cooperative estimate while complying with performance criteria such as keeping a pre-set distance between the teammates and the target, avoiding collisions with teammates and/or surrounding obstacles.

DOI [BibTex]

2015

DOI [BibTex]


Thumb xl grassmanteaser
Scalable Robust Principal Component Analysis using Grassmann Averages

Hauberg, S., Feragen, A., Enficiaud, R., Black, M.

IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), December 2015 (article)

Abstract
In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average (GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average (TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.

preprint pdf from publisher supplemental Project Page [BibTex]


Thumb xl screen shot 2015 10 14 at 08.57.57
Multi-view and 3D Deformable Part Models

Pepik, B., Stark, M., Gehler, P., Schiele, B.

Pattern Analysis and Machine Intelligence, 37(11):14, IEEE, March 2015 (article)

Abstract
As objects are inherently 3-dimensional, they have been modeled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 3D object representations have been neglected and 2D feature-based models are the predominant paradigm in object detection nowadays. While such models have achieved outstanding bounding box detection performance, they come with limited expressiveness, as they are clearly limited in their capability of reasoning about 3D shape or viewpoints. In this work, we bring the worlds of 3D and 2D object representations closer, by building an object detector which leverages the expressive power of 3D object representations while at the same time can be robustly matched to image evidence. To that end, we gradually extend the successful deformable part model [1] to include viewpoint information and part-level 3D geometry information, resulting in several different models with different level of expressiveness. We end up with a 3D object model, consisting of multiple object parts represented in 3D and a continuous appearance model. We experimentally verify that our models, while providing richer object hypotheses than the 2D object models, provide consistently better joint object localization and viewpoint estimation than the state-of-the-art multi-view and 3D object detectors on various benchmarks (KITTI [2], 3D object classes [3], Pascal3D+ [4], Pascal VOC 2007 [5], EPFL multi-view cars [6]).

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Thumb xl splitbodieswebteaser2
SMPL: A Skinned Multi-Person Linear Model

Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M. J.

ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1-248:16, ACM, New York, NY, October 2015 (article)

Abstract
We present a learned model of human body shape and pose-dependent shape variation that is more accurate than previous models and is compatible with existing graphics pipelines. Our Skinned Multi-Person Linear model (SMPL) is a skinned vertex-based model that accurately represents a wide variety of body shapes in natural human poses. The parameters of the model are learned from data including the rest pose template, blend weights, pose-dependent blend shapes, identity-dependent blend shapes, and a regressor from vertices to joint locations. Unlike previous models, the pose-dependent blend shapes are a linear function of the elements of the pose rotation matrices. This simple formulation enables training the entire model from a relatively large number of aligned 3D meshes of different people in different poses. We quantitatively evaluate variants of SMPL using linear or dual-quaternion blend skinning and show that both are more accurate than a Blend-SCAPE model trained on the same data. We also extend SMPL to realistically model dynamic soft-tissue deformations. Because it is based on blend skinning, SMPL is compatible with existing rendering engines and we make it available for research purposes.

pdf video code/model errata DOI Project Page [BibTex]

pdf video code/model errata DOI Project Page [BibTex]


Thumb xl objs2acts
Linking Objects to Actions: Encoding of Target Object and Grasping Strategy in Primate Ventral Premotor Cortex

Vargas-Irwin, C. E., Franquemont, L., Black, M. J., Donoghue, J. P.

Journal of Neuroscience, 35(30):10888-10897, July 2015 (article)

Abstract
Neural activity in ventral premotor cortex (PMv) has been associated with the process of matching perceived objects with the motor commands needed to grasp them. It remains unclear how PMv networks can flexibly link percepts of objects affording multiple grasp options into a final desired hand action. Here, we use a relational encoding approach to track the functional state of PMv neuronal ensembles in macaque monkeys through the process of passive viewing, grip planning, and grasping movement execution. We used objects affording multiple possible grip strategies. The task included separate instructed delay periods for object presentation and grip instruction. This approach allowed us to distinguish responses elicited by the visual presentation of the objects from those associated with selecting a given motor plan for grasping. We show that PMv continuously incorporates information related to object shape and grip strategy as it becomes available, revealing a transition from a set of ensemble states initially most closely related to objects, to a new set of ensemble patterns reflecting unique object-grip combinations. These results suggest that PMv dynamically combines percepts, gradually navigating toward activity patterns associated with specific volitional actions, rather than directly mapping perceptual object properties onto categorical grip representations. Our results support the idea that PMv is part of a network that dynamically computes motor plans from perceptual information. Significance Statement: The present work demonstrates that the activity of groups of neurons in primate ventral premotor cortex reflects information related to visually presented objects, as well as the motor strategy used to grasp them, linking individual objects to multiple possible grips. PMv could provide useful control signals for neuroprosthetic assistive devices designed to interact with objects in a flexible way.

publisher link DOI Project Page [BibTex]

publisher link DOI Project Page [BibTex]


Thumb xl dynateaser
Dyna: A Model of Dynamic Human Shape in Motion

Pons-Moll, G., Romero, J., Mahmood, N., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 34(4):120:1-120:14, ACM, August 2015 (article)

Abstract
To look human, digital full-body avatars need to have soft tissue deformations like those of real people. We learn a model of soft-tissue deformations from examples using a high-resolution 4D capture system and a method that accurately registers a template mesh to sequences of 3D scans. Using over 40,000 scans of ten subjects, we learn how soft tissue motion causes mesh triangles to deform relative to a base 3D body model. Our Dyna model uses a low-dimensional linear subspace to approximate soft-tissue deformation and relates the subspace coefficients to the changing pose of the body. Dyna uses a second-order auto-regressive model that predicts soft-tissue deformations based on previous deformations, the velocity and acceleration of the body, and the angular velocities and accelerations of the limbs. Dyna also models how deformations vary with a person’s body mass index (BMI), producing different deformations for people with different shapes. Dyna realistically represents the dynamics of soft tissue for previously unseen subjects and motions. We provide tools for animators to modify the deformations and apply them to new stylized characters.

pdf preprint video data DOI Project Page Project Page Project Page [BibTex]


Thumb xl thumb teaser mrg
Metric Regression Forests for Correspondence Estimation

Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., Fitzgibbon, A.

International Journal of Computer Vision, pages: 1-13, 2015 (article)

springer PDF Project Page [BibTex]

springer PDF Project Page [BibTex]


Thumb xl ssimssmall
Spike train SIMilarity Space (SSIMS): A framework for single neuron and ensemble data analysis

Vargas-Irwin, C. E., Brandman, D. M., Zimmermann, J. B., Donoghue, J. P., Black, M. J.

Neural Computation, 27(1):1-31, MIT Press, January 2015 (article)

Abstract
We present a method to evaluate the relative similarity of neural spiking patterns by combining spike train distance metrics with dimensionality reduction. Spike train distance metrics provide an estimate of similarity between activity patterns at multiple temporal resolutions. Vectors of pair-wise distances are used to represent the intrinsic relationships between multiple activity patterns at the level of single units or neuronal ensembles. Dimensionality reduction is then used to project the data into concise representations suitable for clustering analysis as well as exploratory visualization. Algorithm performance and robustness are evaluated using multielectrode ensemble activity data recorded in behaving primates. We demonstrate how Spike train SIMilarity Space (SSIMS) analysis captures the relationship between goal directions for an 8-directional reaching task and successfully segregates grasp types in a 3D grasping task in the absence of kinematic information. The algorithm enables exploration of virtually any type of neural spiking (time series) data, providing similarity-based clustering of neural activity states with minimal assumptions about potential information encoding models.

pdf: publisher site pdf: author's proof DOI Project Page [BibTex]

pdf: publisher site pdf: author's proof DOI Project Page [BibTex]

2014


Thumb xl tang14ijcv
Detection and Tracking of Occluded People

Tang, S., Andriluka, M., Schiele, B.

IJCV, 2014 (article)

PDF [BibTex]

2014

PDF [BibTex]


no image
3D to 2D bijection for spherical objects under equidistant fisheye projection

Ahmad, A., Xavier, J., Santos-Victor, J., Lima, P.

Computer Vision and Image Understanding, 125, pages: 172-183, August 2014 (article)

Abstract
The core problem addressed in this article is the 3D position detection of a spherical object of known-radius in a single image frame, obtained by a dioptric vision system consisting of only one fisheye lens camera that follows equidistant projection model. The central contribution is a bijection principle between a known-radius spherical object’s 3D world position and its 2D projected image curve, that we prove, thus establishing that for every possible 3D world position of the spherical object, there exists a unique curve on the image plane if the object is projected through a fisheye lens that follows equidistant projection model. Additionally, we present a setup for the experimental verification of the principle’s correctness. In previously published works we have applied this principle to detect and subsequently track a known-radius spherical object.

DOI [BibTex]

DOI [BibTex]


Thumb xl jnb1
Segmentation of Biomedical Images Using Active Contour Model with Robust Image Feature and Shape Prior

S. Y. Yeo, X. Xie, I. Sazonov, P. Nithiarasu

International Journal for Numerical Methods in Biomedical Engineering, 30(2):232- 248, 2014 (article)

Abstract
In this article, a new level set model is proposed for the segmentation of biomedical images. The image energy of the proposed model is derived from a robust image gradient feature which gives the active contour a global representation of the geometric configuration, making it more robust in dealing with image noise, weak edges, and initial configurations. Statistical shape information is incorporated using nonparametric shape density distribution, which allows the shape model to handle relatively large shape variations. The segmentation of various shapes from both synthetic and real images depict the robustness and efficiency of the proposed method.

[BibTex]

[BibTex]


Thumb xl glsn1
Automatic 4D Reconstruction of Patient-Specific Cardiac Mesh with 1- to-1 Vertex Correspondence from Segmented Contours Lines

C. W. Lim, Y. Su, S. Y. Yeo, G. M. Ng, V. T. Nguyen, L. Zhong, R. S. Tan, K. K. Poh, P. Chai,

PLOS ONE, 9(4), 2014 (article)

Abstract
We propose an automatic algorithm for the reconstruction of patient-specific cardiac mesh models with 1-to-1 vertex correspondence. In this framework, a series of 3D meshes depicting the endocardial surface of the heart at each time step is constructed, based on a set of border delineated magnetic resonance imaging (MRI) data of the whole cardiac cycle. The key contribution in this work involves a novel reconstruction technique to generate a 4D (i.e., spatial–temporal) model of the heart with 1-to-1 vertex mapping throughout the time frames. The reconstructed 3D model from the first time step is used as a base template model and then deformed to fit the segmented contours from the subsequent time steps. A method to determine a tree-based connectivity relationship is proposed to ensure robust mapping during mesh deformation. The novel feature is the ability to handle intra- and inter-frame 2D topology changes of the contours, which manifests as a series of merging and splitting of contours when the images are viewed either in a spatial or temporal sequence. Our algorithm has been tested on five acquisitions of cardiac MRI and can successfully reconstruct the full 4D heart model in around 30 minutes per subject. The generated 4D heart model conforms very well with the input segmented contours and the mesh element shape is of reasonably good quality. The work is important in the support of downstream computational simulation activities.

[BibTex]

[BibTex]


Thumb xl mosh heroes icon
MoSh: Motion and Shape Capture from Sparse Markers

Loper, M. M., Mahmood, N., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 33(6):220:1-220:13, ACM, New York, NY, USA, November 2014 (article)

Abstract
Marker-based motion capture (mocap) is widely criticized as producing lifeless animations. We argue that important information about body surface motion is present in standard marker sets but is lost in extracting a skeleton. We demonstrate a new approach called MoSh (Motion and Shape capture), that automatically extracts this detail from mocap data. MoSh estimates body shape and pose together using sparse marker data by exploiting a parametric model of the human body. In contrast to previous work, MoSh solves for the marker locations relative to the body and estimates accurate body shape directly from the markers without the use of 3D scans; this effectively turns a mocap system into an approximate body scanner. MoSh is able to capture soft tissue motions directly from markers by allowing body shape to vary over time. We evaluate the effect of different marker sets on pose and shape accuracy and propose a new sparse marker set for capturing soft-tissue motion. We illustrate MoSh by recovering body shape, pose, and soft-tissue motion from archival mocap data and using this to produce animations with subtlety and realism. We also show soft-tissue motion retargeting to new characters and show how to magnify the 3D deformations of soft tissue to create animations with appealing exaggerations.

pdf video data pdf from publisher link (url) DOI Project Page [BibTex]

pdf video data pdf from publisher link (url) DOI Project Page [BibTex]


Thumb xl freelymoving2
A freely-moving monkey treadmill model

Foster, J., Nuyujukian, P., Freifeld, O., Gao, H., Walker, R., Ryu, S., Meng, T., Murmann, B., Black, M., Shenoy, K.

J. of Neural Engineering, 11(4):046020, 2014 (article)

Abstract
Objective: Motor neuroscience and brain-machine interface (BMI) design is based on examining how the brain controls voluntary movement, typically by recording neural activity and behavior from animal models. Recording technologies used with these animal models have traditionally limited the range of behaviors that can be studied, and thus the generality of science and engineering research. We aim to design a freely-moving animal model using neural and behavioral recording technologies that do not constrain movement. Approach: We have established a freely-moving rhesus monkey model employing technology that transmits neural activity from an intracortical array using a head-mounted device and records behavior through computer vision using markerless motion capture. We demonstrate the excitability and utility of this new monkey model, including the fi rst recordings from motor cortex while rhesus monkeys walk quadrupedally on a treadmill. Main results: Using this monkey model, we show that multi-unit threshold-crossing neural activity encodes the phase of walking and that the average ring rate of the threshold crossings covaries with the speed of individual steps. On a population level, we find that neural state-space trajectories of walking at diff erent speeds have similar rotational dynamics in some dimensions that evolve at the step rate of walking, yet robustly separate by speed in other state-space dimensions. Significance: Freely-moving animal models may allow neuroscientists to examine a wider range of behaviors and can provide a flexible experimental paradigm for examining the neural mechanisms that underlie movement generation across behaviors and environments. For BMIs, freely-moving animal models have the potential to aid prosthetic design by examining how neural encoding changes with posture, environment, and other real-world context changes. Understanding this new realm of behavior in more naturalistic settings is essential for overall progress of basic motor neuroscience and for the successful translation of BMIs to people with paralysis.

pdf Supplementary DOI Project Page [BibTex]

pdf Supplementary DOI Project Page [BibTex]


Thumb xl sap copy
Can I recognize my body’s weight? The influence of shape and texture on the perception of self

Piryankova, I., Stefanucci, J., Romero, J., de la Rosa, S., Black, M., Mohler, B.

ACM Transactions on Applied Perception for the Symposium on Applied Perception, 11(3):13:1-13:18, September 2014 (article)

Abstract
The goal of this research was to investigate women’s sensitivity to changes in their perceived weight by altering the body mass index (BMI) of the participants’ personalized avatars displayed on a large-screen immersive display. We created the personalized avatars with a full-body 3D scanner that records both the participants’ body geometry and texture. We altered the weight of the personalized avatars to produce changes in BMI while keeping height, arm length and inseam fixed and exploited the correlation between body geometry and anthropometric measurements encapsulated in a statistical body shape model created from thousands of body scans. In a 2x2 psychophysical experiment, we investigated the relative importance of visual cues, namely shape (own shape vs. an average female body shape with equivalent height and BMI to the participant) and texture (own photo-realistic texture or checkerboard pattern texture) on the ability to accurately perceive own current body weight (by asking them ‘Is the avatar the same weight as you?’). Our results indicate that shape (where height and BMI are fixed) had little effect on the perception of body weight. Interestingly, the participants perceived their body weight veridically when they saw their own photo-realistic texture and significantly underestimated their body weight when the avatar had a checkerboard patterned texture. The range that the participants accepted as their own current weight was approximately a 0.83 to −6.05 BMI% change tolerance range around their perceived weight. Both the shape and the texture had an effect on the reported similarity of the body parts and the whole avatar to the participant’s body. This work has implications for new measures for patients with body image disorders, as well as researchers interested in creating personalized avatars for games, training applications or virtual reality.

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl fancy rgb
Breathing Life into Shape: Capturing, Modeling and Animating 3D Human Breathing

Tsoli, A., Mahmood, N., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 33(4):52:1-52:11, ACM, New York, NY, July 2014 (article)

Abstract
Modeling how the human body deforms during breathing is important for the realistic animation of lifelike 3D avatars. We learn a model of body shape deformations due to breathing for different breathing types and provide simple animation controls to render lifelike breathing regardless of body shape. We capture and align high-resolution 3D scans of 58 human subjects. We compute deviations from each subject’s mean shape during breathing, and study the statistics of such shape changes for different genders, body shapes, and breathing types. We use the volume of the registered scans as a proxy for lung volume and learn a novel non-linear model relating volume and breathing type to 3D shape deformations and pose changes. We then augment a SCAPE body model so that body shape is determined by identity, pose, and the parameters of the breathing model. These parameters provide an intuitive interface with which animators can synthesize 3D human avatars with realistic breathing motions. We also develop a novel interface for animating breathing using a spirometer, which measures the changes in breathing volume of a “breath actor.”

pdf video link (url) DOI Project Page [BibTex]

pdf video link (url) DOI Project Page [BibTex]


Thumb xl tbme
Simpler, faster, more accurate melanocytic lesion segmentation through MEDS

Peruch, F., Bogo, F., Bonazza, M., Cappelleri, V., Peserico, E.

IEEE Transactions on Biomedical Engineering, 61(2):557-565, February 2014 (article)

DOI [BibTex]

DOI [BibTex]


Thumb xl tpami small
A physically-based approach to reflection separation: from physical modeling to constrained optimization

Kong, N., Tai, Y., Shin, J. S.

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 36(2):209-221, IEEE Computer Society, Febuary 2014 (article)

Abstract
We propose a physically-based approach to separate reflection using multiple polarized images with a background scene captured behind glass. The input consists of three polarized images, each captured from the same view point but with a different polarizer angle separated by 45 degrees. The output is the high-quality separation of the reflection and background layers from each of the input images. A main technical challenge for this problem is that the mixing coefficient for the reflection and background layers depends on the angle of incidence and the orientation of the plane of incidence, which are spatially varying over the pixels of an image. Exploiting physical properties of polarization for a double-surfaced glass medium, we propose a multiscale scheme which automatically finds the optimal separation of the reflection and background layers. Through experiments, we demonstrate that our approach can generate superior results to those of previous methods.

Publisher site [BibTex]

Publisher site [BibTex]


Thumb xl homerjournal
Adaptive Offset Correction for Intracortical Brain Computer Interfaces

Homer, M. L., Perge, J. A., Black, M. J., Harrison, M. T., Cash, S. S., Hochberg, L. R.

IEEE Transactions on Neural Systems and Rehabilitation Engineering, 22(2):239-248, March 2014 (article)

Abstract
Intracortical brain computer interfaces (iBCIs) decode intended movement from neural activity for the control of external devices such as a robotic arm. Standard approaches include a calibration phase to estimate decoding parameters. During iBCI operation, the statistical properties of the neural activity can depart from those observed during calibration, sometimes hindering a user’s ability to control the iBCI. To address this problem, we adaptively correct the offset terms within a Kalman filter decoder via penalized maximum likelihood estimation. The approach can handle rapid shifts in neural signal behavior (on the order of seconds) and requires no knowledge of the intended movement. The algorithm, called MOCA, was tested using simulated neural activity and evaluated retrospectively using data collected from two people with tetraplegia operating an iBCI. In 19 clinical research test cases, where a nonadaptive Kalman filter yielded relatively high decoding errors, MOCA significantly reduced these errors (10.6 ± 10.1\%; p < 0.05, pairwise t-test). MOCA did not significantly change the error in the remaining 23 cases where a nonadaptive Kalman filter already performed well. These results suggest that MOCA provides more robust decoding than the standard Kalman filter for iBCIs.

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl pami
3D Traffic Scene Understanding from Movable Platforms

Geiger, A., Lauer, M., Wojek, C., Stiller, C., Urtasun, R.

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 36(5):1012-1025, published, IEEE, Los Alamitos, CA, May 2014 (article)

Abstract
In this paper, we present a novel probabilistic generative model for multi-object traffic scene understanding from movable platforms which reasons jointly about the 3D scene layout as well as the location and orientation of objects in the scene. In particular, the scene topology, geometry and traffic activities are inferred from short video sequences. Inspired by the impressive driving capabilities of humans, our model does not rely on GPS, lidar or map knowledge. Instead, it takes advantage of a diverse set of visual cues in the form of vehicle tracklets, vanishing points, semantic scene labels, scene flow and occupancy grids. For each of these cues we propose likelihood functions that are integrated into a probabilistic generative model. We learn all model parameters from training data using contrastive divergence. Experiments conducted on videos of 113 representative intersections show that our approach successfully infers the correct layout in a variety of very challenging scenarios. To evaluate the importance of each feature cue, experiments using different feature combinations are conducted. Furthermore, we show how by employing context derived from the proposed method we are able to improve over the state-of-the-art in terms of object detection and object orientation estimation in challenging and cluttered urban environments.

pdf link (url) Project Page [BibTex]

pdf link (url) Project Page [BibTex]


Thumb xl ijcvflow2
A Quantitative Analysis of Current Practices in Optical Flow Estimation and the Principles behind Them

Sun, D., Roth, S., Black, M. J.

International Journal of Computer Vision (IJCV), 106(2):115-137, 2014 (article)

Abstract
The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective function, the optimization method, and modern implementation practices influence accuracy. We discover that "classical'' flow formulations perform surprisingly well when combined with modern optimization and implementation techniques. One key implementation detail is the median filtering of intermediate flow fields during optimization. While this improves the robustness of classical methods it actually leads to higher energy solutions, meaning that these methods are not optimizing the original objective function. To understand the principles behind this phenomenon, we derive a new objective function that formalizes the median filtering heuristic. This objective function includes a non-local smoothness term that robustly integrates flow estimates over large spatial neighborhoods. By modifying this new term to include information about flow and image boundaries we develop a method that can better preserve motion details. To take advantage of the trend towards video in wide-screen format, we further introduce an asymmetric pyramid downsampling scheme that enables the estimation of longer range horizontal motions. The methods are evaluated on Middlebury, MPI Sintel, and KITTI datasets using the same parameter settings.

pdf full text code Project Page [BibTex]

2013


no image
Multi-robot cooperative spherical-object tracking in 3D space based on particle filters

Ahmad, A., Lima, P.

Robotics and Autonomous Systems, 61(10):1084-1093, October 2013 (article)

Abstract
This article presents a cooperative approach for tracking a moving spherical object in 3D space by a team of mobile robots equipped with sensors, in a highly dynamic environment. The tracker’s core is a particle filter, modified to handle, within a single unified framework, the problem of complete or partial occlusion for some of the involved mobile sensors, as well as inconsistent estimates in the global frame among sensors, due to observation errors and/or self-localization uncertainty. We present results supporting our approach by applying it to a team of real soccer robots tracking a soccer ball, including comparison with ground truth.

DOI [BibTex]

2013

DOI [BibTex]


Thumb xl thumb hennigk2012 2
Quasi-Newton Methods: A New Direction

Hennig, P., Kiefel, M.

Journal of Machine Learning Research, 14(1):843-865, March 2013 (article)

Abstract
Four decades after their invention, quasi-Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors.

website+code pdf link (url) Project Page [BibTex]

website+code pdf link (url) Project Page [BibTex]


Thumb xl tro
Extracting Postural Synergies for Robotic Grasping

Romero, J., Feix, T., Ek, C., Kjellstrom, H., Kragic, D.

Robotics, IEEE Transactions on, 29(6):1342-1352, December 2013 (article)

[BibTex]

[BibTex]