Header logo is ps


2020


AirCapRL: Autonomous Aerial Human Motion Capture Using Deep Reinforcement Learning
AirCapRL: Autonomous Aerial Human Motion Capture Using Deep Reinforcement Learning

Tallamraju, R., Saini, N., Bonetto, E., Pabst, M., Liu, Y. T., Black, M., Ahmad, A.

IEEE Robotics and Automation Letters, IEEE Robotics and Automation Letters, 5(4):6678 - 6685, IEEE, October 2020, Also accepted and presented in the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). (article)

Abstract
In this letter, we introduce a deep reinforcement learning (DRL) based multi-robot formation controller for the task of autonomous aerial human motion capture (MoCap). We focus on vision-based MoCap, where the objective is to estimate the trajectory of body pose, and shape of a single moving person using multiple micro aerial vehicles. State-of-the-art solutions to this problem are based on classical control methods, which depend on hand-crafted system, and observation models. Such models are difficult to derive, and generalize across different systems. Moreover, the non-linearities, and non-convexities of these models lead to sub-optimal controls. In our work, we formulate this problem as a sequential decision making task to achieve the vision-based motion capture objectives, and solve it using a deep neural network-based RL method. We leverage proximal policy optimization (PPO) to train a stochastic decentralized control policy for formation control. The neural network is trained in a parallelized setup in synthetic environments. We performed extensive simulation experiments to validate our approach. Finally, real-robot experiments demonstrate that our policies generalize to real world conditions.

link (url) DOI [BibTex]

2020

link (url) DOI [BibTex]


STAR: Sparse Trained Articulated Human Body Regressor
STAR: Sparse Trained Articulated Human Body Regressor

Osman, A. A. A., Bolkart, T., Black, M. J.

In European Conference on Computer Vision (ECCV) , August 2020 (inproceedings)

Abstract
The SMPL body model is widely used for the estimation, synthesis, and analysis of 3D human pose and shape. While popular, we show that SMPL has several limitations and introduce STAR, which is quantitatively and qualitatively superior to SMPL. First, SMPL has a huge number of parameters resulting from its use of global blend shapes. These dense pose-corrective offsets relate every vertex on the mesh to all the joints in the kinematic tree, capturing spurious long-range correlations. To address this, we define per-joint pose correctives and learn the subset of mesh vertices that are influenced by each joint movement. This sparse formulation results in more realistic deformations and significantly reduces the number of model parameters to 20% of SMPL. When trained on the same data as SMPL, STAR generalizes better despite having many fewer parameters. Second, SMPL factors pose-dependent deformations from body shape while, in reality, people with different shapes deform differently. Consequently, we learn shape-dependent pose-corrective blend shapes that depend on both body pose and BMI. Third, we show that the shape space of SMPL is not rich enough to capture the variation in the human population. We address this by training STAR with an additional 10,000 scans of male and female subjects, and show that this results in better model generalization. STAR is compact, generalizes better to new bodies and is a drop-in replacement for SMPL. STAR is publicly available for research purposes at http://star.is.tue.mpg.de.

Project Page Code Video paper supplemental [BibTex]


3D Morphable Face Models - Past, Present and Future
3D Morphable Face Models - Past, Present and Future

Egger, B., Smith, W. A. P., Tewari, A., Wuhrer, S., Zollhoefer, M., Beeler, T., Bernard, F., Bolkart, T., Kortylewski, A., Romdhani, S., Theobalt, C., Blanz, V., Vetter, T.

ACM Transactions on Graphics, 39(5), August 2020 (article)

Abstract
In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation, and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications.

project page pdf preprint DOI [BibTex]

project page pdf preprint DOI [BibTex]


Monocular Expressive Body Regression through Body-Driven Attention
Monocular Expressive Body Regression through Body-Driven Attention

Choutas, V., Pavlakos, G., Bolkart, T., Tzionas, D., Black, M. J.

In Computer Vision – ECCV 2020, Springer International Publishing, Cham, August 2020 (inproceedings)

Abstract
To understand how people look, interact, or perform tasks,we need to quickly and accurately capture their 3D body, face, and hands together from an RGB image. Most existing methods focus only on parts of the body. A few recent approaches reconstruct full expressive 3D humans from images using 3D body models that include the face and hands. These methods are optimization-based and thus slow, prone to local optima, and require 2D keypoints as input. We address these limitations by introducing ExPose (EXpressive POse and Shape rEgression), which directly regresses the body, face, and hands, in SMPL-X format, from an RGB image. This is a hard problem due to the high dimensionality of the body and the lack of expressive training data. Additionally, hands and faces are much smaller than the body, occupying very few image pixels. This makes hand and face estimation hard when body images are downscaled for neural networks. We make three main contributions. First, we account for the lack of training data by curating a dataset of SMPL-X fits on in-the-wild images. Second, we observe that body estimation localizes the face and hands reasonably well. We introduce body-driven attention for face and hand regions in the original image to extract higher-resolution crops that are fed to dedicated refinement modules. Third, these modules exploit part-specific knowledge from existing face and hand-only datasets. ExPose estimates expressive 3D humans more accurately than existing optimization methods at a small fraction of the computational cost. Our data, model and code are available for research at https://expose.is.tue.mpg.de.

code Short video Long video arxiv pdf suppl link (url) Project Page [BibTex]


GRAB: A Dataset of Whole-Body Human Grasping of Objects
GRAB: A Dataset of Whole-Body Human Grasping of Objects

Taheri, O., Ghorbani, N., Black, M. J., Tzionas, D.

In Computer Vision – ECCV 2020, Springer International Publishing, Cham, August 2020 (inproceedings)

Abstract
Training computers to understand, model, and synthesize human grasping requires a rich dataset containing complex 3D object shapes, detailed contact information, hand pose and shape, and the 3D body motion over time. While "grasping" is commonly thought of as a single hand stably lifting an object, we capture the motion of the entire body and adopt the generalized notion of "whole-body grasps". Thus, we collect a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size. Given MoCap markers, we fit the full 3D body shape and pose, including the articulated face and hands, as well as the 3D object pose. This gives detailed 3D meshes over time, from which we compute contact between the body and object. This is a unique dataset, that goes well beyond existing ones for modeling and understanding how humans grasp and manipulate objects, how their full body is involved, and how interaction varies with the task. We illustrate the practical value of GRAB with an example application; we train GrabNet, a conditional generative network, to predict 3D hand grasps for unseen 3D object shapes. The dataset and code are available for research purposes at https://grab.is.tue.mpg.de.

pdf suppl video (long) video (short) link (url) DOI [BibTex]

pdf suppl video (long) video (short) link (url) DOI [BibTex]


Learning to Dress 3D People in Generative Clothing
Learning to Dress 3D People in Generative Clothing

Ma, Q., Yang, J., Ranjan, A., Pujades, S., Pons-Moll, G., Tang, S., Black, M. J.

In Computer Vision and Pattern Recognition (CVPR), pages: 6468-6477, IEEE, June 2020 (inproceedings)

Abstract
Three-dimensional human body models are widely used in the analysis of human pose and motion. Existing models, however, are learned from minimally-clothed 3D scans and thus do not generalize to the complexity of dressed people in common images and videos. Additionally, current models lack the expressive power needed to represent the complex non-linear geometry of pose-dependent clothing shape. To address this, we learn a generative 3D mesh model of clothed people from 3D scans with varying pose and clothing. Specifically, we train a conditional Mesh-VAE-GAN to learn the clothing deformation from the SMPL body model, making clothing an additional term on SMPL. Our model is conditioned on both pose and clothing type, giving the ability to draw samples of clothing to dress different body shapes in a variety of styles and poses. To preserve wrinkle detail, our Mesh-VAE-GAN extends patchwise discriminators to 3D meshes. Our model, named CAPE, represents global shape and fine local structure, effectively extending the SMPL body model to clothing. To our knowledge, this is the first generative model that directly dresses 3D human body meshes and generalizes to different poses.

Project page Code Short video Long video arXiv DOI [BibTex]

Project page Code Short video Long video arXiv DOI [BibTex]


{GENTEL : GENerating Training data Efficiently for Learning to segment medical images}
GENTEL : GENerating Training data Efficiently for Learning to segment medical images

Thakur, R. P., Rocamora, S. P., Goel, L., Pohmann, R., Machann, J., Black, M. J.

Congrès Reconnaissance des Formes, Image, Apprentissage et Perception (RFAIP), June 2020 (conference)

Abstract
Accurately segmenting MRI images is crucial for many clinical applications. However, manually segmenting images with accurate pixel precision is a tedious and time consuming task. In this paper we present a simple, yet effective method to improve the efficiency of the image segmentation process. We propose to transform the image annotation task into a binary choice task. We start by using classical image processing algorithms with different parameter values to generate multiple, different segmentation masks for each input MRI image. Then, instead of segmenting the pixels of the images, the user only needs to decide whether a segmentation is acceptable or not. This method allows us to efficiently obtain high quality segmentations with minor human intervention. With the selected segmentations, we train a state-of-the-art neural network model. For the evaluation, we use a second MRI dataset (1.5T Dataset), acquired with a different protocol and containing annotations. We show that the trained network i) is able to automatically segment cases where none of the classical methods obtain a high quality result ; ii) generalizes to the second MRI dataset, which was acquired with a different protocol and was never seen at training time ; and iii) enables detection of miss-annotations in this second dataset. Quantitatively, the trained network obtains very good results: DICE score - mean 0.98, median 0.99- and Hausdorff distance (in pixels) - mean 4.7, median 2.0-.

Project Page PDF [BibTex]

Project Page PDF [BibTex]


Generating 3D People in Scenes without People
Generating 3D People in Scenes without People

Zhang, Y., Hassan, M., Neumann, H., Black, M. J., Tang, S.

In Computer Vision and Pattern Recognition (CVPR), pages: 6194-6204, June 2020 (inproceedings)

Abstract
We present a fully automatic system that takes a 3D scene and generates plausible 3D human bodies that are posed naturally in that 3D scene. Given a 3D scene without people, humans can easily imagine how people could interact with the scene and the objects in it. However, this is a challenging task for a computer as solving it requires that (1) the generated human bodies to be semantically plausible within the 3D environment (e.g. people sitting on the sofa or cooking near the stove), and (2) the generated human-scene interaction to be physically feasible such that the human body and scene do not interpenetrate while, at the same time, body-scene contact supports physical interactions. To that end, we make use of the surface-based 3D human model SMPL-X. We first train a conditional variational autoencoder to predict semantically plausible 3D human poses conditioned on latent scene representations, then we further refine the generated 3D bodies using scene constraints to enforce feasible physical interaction. We show that our approach is able to synthesize realistic and expressive 3D human bodies that naturally interact with 3D environment. We perform extensive experiments demonstrating that our generative framework compares favorably with existing methods, both qualitatively and quantitatively. We believe that our scene-conditioned 3D human generation pipeline will be useful for numerous applications; e.g. to generate training data for human pose estimation, in video games and in VR/AR. Our project page for data and code can be seen at: \url{https://vlg.inf.ethz.ch/projects/PSI/}.

Code PDF DOI [BibTex]

Code PDF DOI [BibTex]


Learning Physics-guided Face Relighting under Directional Light
Learning Physics-guided Face Relighting under Directional Light

Nestmeyer, T., Lalonde, J., Matthews, I., Lehrmann, A. M.

In Conference on Computer Vision and Pattern Recognition, pages: 5123-5132, IEEE/CVF, June 2020 (inproceedings) Accepted

Abstract
Relighting is an essential step in realistically transferring objects from a captured image into another environment. For example, authentic telepresence in Augmented Reality requires faces to be displayed and relit consistent with the observer's scene lighting. We investigate end-to-end deep learning architectures that both de-light and relight an image of a human face. Our model decomposes the input image into intrinsic components according to a diffuse physics-based image formation model. We enable non-diffuse effects including cast shadows and specular highlights by predicting a residual correction to the diffuse render. To train and evaluate our model, we collected a portrait database of 21 subjects with various expressions and poses. Each sample is captured in a controlled light stage setup with 32 individual light sources. Our method creates precise and believable relighting results and generalizes to complex illumination conditions and challenging poses, including when the subject is not looking straight at the camera.

Paper [BibTex]

Paper [BibTex]


Learning and Tracking the {3D} Body Shape of Freely Moving Infants from {RGB-D} sequences
Learning and Tracking the 3D Body Shape of Freely Moving Infants from RGB-D sequences

Hesse, N., Pujades, S., Black, M., Arens, M., Hofmann, U., Schroeder, S.

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 42(10):2540-2551, 2020 (article)

Abstract
Statistical models of the human body surface are generally learned from thousands of high-quality 3D scans in predefined poses to cover the wide variety of human body shapes and articulations. Acquisition of such data requires expensive equipment, calibration procedures, and is limited to cooperative subjects who can understand and follow instructions, such as adults. We present a method for learning a statistical 3D Skinned Multi-Infant Linear body model (SMIL) from incomplete, low-quality RGB-D sequences of freely moving infants. Quantitative experiments show that SMIL faithfully represents the RGB-D data and properly factorizes the shape and pose of the infants. To demonstrate the applicability of SMIL, we fit the model to RGB-D sequences of freely moving infants and show, with a case study, that our method captures enough motion detail for General Movements Assessment (GMA), a method used in clinical practice for early detection of neurodevelopmental disorders in infants. SMIL provides a new tool for analyzing infant shape and movement and is a step towards an automated system for GMA.

pdf Journal DOI [BibTex]

pdf Journal DOI [BibTex]


{VIBE}: Video Inference for Human Body Pose and Shape Estimation
VIBE: Video Inference for Human Body Pose and Shape Estimation

Kocabas, M., Athanasiou, N., Black, M. J.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pages: 5252-5262, IEEE, June 2020 (inproceedings)

Abstract
Human motion is fundamental to understanding behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methodsfail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. To address this problem, we propose “Video Inference for Body Pose and Shape Estimation” (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty is an adversarial learning framework that leverages AMASS to discriminate between real human motions and those produced by our temporal pose and shape regression networks. We define a temporal network architecture and show that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. We perform extensive experimentation to analyze the importance of motion and demonstrate the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving state-of-the-art performance. Code and pretrained models are available at https://github.com/mkocabas/VIBE

arXiv code video supplemental video DOI Project Page [BibTex]

arXiv code video supplemental video DOI Project Page [BibTex]


General Movement Assessment from videos of computed {3D} infant body models is equally effective compared to conventional {RGB} Video rating
General Movement Assessment from videos of computed 3D infant body models is equally effective compared to conventional RGB Video rating

Schroeder, S., Hesse, N., Weinberger, R., Tacke, U., Gerstl, L., Hilgendorff, A., Heinen, F., Arens, M., Bodensteiner, C., Dijkstra, L. J., Pujades, S., Black, M., Hadders-Algra, M.

Early Human Development, 144, May 2020 (article)

Abstract
Background: General Movement Assessment (GMA) is a powerful tool to predict Cerebral Palsy (CP). Yet, GMA requires substantial training hampering its implementation in clinical routine. This inspired a world-wide quest for automated GMA. Aim: To test whether a low-cost, marker-less system for three-dimensional motion capture from RGB depth sequences using a whole body infant model may serve as the basis for automated GMA. Study design: Clinical case study at an academic neurodevelopmental outpatient clinic. Subjects: Twenty-nine high-risk infants were recruited and assessed at their clinical follow-up at 2-4 month corrected age (CA). Their neurodevelopmental outcome was assessed regularly up to 12-31 months CA. Outcome measures: GMA according to Hadders-Algra by a masked GMA-expert of conventional and computed 3D body model (“SMIL motion”) videos of the same GMs. Agreement between both GMAs was assessed, and sensitivity and specificity of both methods to predict CP at ≥12 months CA. Results: The agreement of the two GMA ratings was substantial, with κ=0.66 for the classification of definitely abnormal (DA) GMs and an ICC of 0.887 (95% CI 0.762;0.947) for a more detailed GM-scoring. Five children were diagnosed with CP (four bilateral, one unilateral CP). The GMs of the child with unilateral CP were twice rated as mildly abnormal. DA-ratings of both videos predicted bilateral CP well: sensitivity 75% and 100%, specificity 88% and 92% for conventional and SMIL motion videos, respectively. Conclusions: Our computed infant 3D full body model is an attractive starting point for automated GMA in infants at risk of CP.

DOI [BibTex]

DOI [BibTex]


Learning Multi-Human Optical Flow
Learning Multi-Human Optical Flow

Ranjan, A., Hoffmann, D. T., Tzionas, D., Tang, S., Romero, J., Black, M. J.

International Journal of Computer Vision (IJCV), (128):873-890, April 2020 (article)

Abstract
The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single-and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on held-out test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research.

pdf DOI poster link (url) DOI [BibTex]

pdf DOI poster link (url) DOI [BibTex]


From Variational to Deterministic Autoencoders
From Variational to Deterministic Autoencoders

Ghosh*, P., Sajjadi*, M. S. M., Vergari, A., Black, M. J., Schölkopf, B.

8th International Conference on Learning Representations (ICLR) , April 2020, *equal contribution (conference)

Abstract
Variational Autoencoders (VAEs) provide a theoretically-backed framework for deep generative models. However, they often produce “blurry” images, which is linked to their training objective. Sampling in the most popular implementation, the Gaussian VAE, can be interpreted as simply injecting noise to the input of a deterministic decoder. In practice, this simply enforces a smooth latent space structure. We challenge the adoption of the full VAE framework on this specific point in favor of a simpler, deterministic one. Specifically, we investigate how substituting stochasticity with other explicit and implicit regularization schemes can lead to a meaningful latent space without having to force it to conform to an arbitrarily chosen prior. To retrieve a generative mechanism for sampling new data points, we propose to employ an efficient ex-post density estimation step that can be readily adopted both for the proposed deterministic autoencoders as well as to improve sample quality of existing VAEs. We show in a rigorous empirical study that regularized deterministic autoencoding achieves state-of-the-art sample quality on the common MNIST, CIFAR-10 and CelebA datasets.

arXiv link (url) [BibTex]

arXiv link (url) [BibTex]


Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations
Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations

Rueegg, N., Lassner, C., Black, M. J., Schindler, K.

In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20), pages: 5561-5569, Febuary 2020 (inproceedings)

Abstract
The goal of many computer vision systems is to transform image pixels into 3D representations. Recent popular models use neural networks to regress directly from pixels to 3D object parameters. Such an approach works well when supervision is available, but in problems like human pose and shape estimation, it is difficult to obtain natural images with 3D ground truth. To go one step further, we propose a new architecture that facilitates unsupervised, or lightly supervised, learning. The idea is to break the problem into a series of transformations between increasingly abstract representations. Each step involves a cycle designed to be learnable without annotated training data, and the chain of cycles delivers the final solution. Specifically, we use 2D body part segments as an intermediate representation that contains enough information to be lifted to 3D, and at the same time is simple enough to be learned in an unsupervised way. We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images. We also explore varying amounts of paired data and show that cycling greatly alleviates the need for paired data. While we present results for modeling humans, our formulation is general and can be applied to other vision problems.

pdf [BibTex]

pdf [BibTex]


no image
Real Time Trajectory Prediction Using Deep Conditional Generative Models

Gomez-Gonzalez, S., Prokudin, S., Schölkopf, B., Peters, J.

IEEE Robotics and Automation Letters, 5(2):970-976, IEEE, January 2020 (article)

arXiv DOI [BibTex]

2006


no image
Finding directional movement representations in motor cortical neural populations using nonlinear manifold learning

WorKim, S., Simeral, J., Jenkins, O., Donoghue, J., Black, M.

World Congress on Medical Physics and Biomedical Engineering 2006, Seoul, Korea, August 2006 (conference)

[BibTex]

2006

[BibTex]


A non-parametric {Bayesian} approach to spike sorting
A non-parametric Bayesian approach to spike sorting

Wood, F., Goldwater, S., Black, M. J.

In International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, pages: 1165-1169, New York, NY, August 2006 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Predicting {3D} people from {2D} pictures
Predicting 3D people from 2D pictures

(Best Paper)

Sigal, L., Black, M. J.

In Proc. IV Conf. on Articulated Motion and DeformableObjects (AMDO), LNCS 4069, pages: 185-195, July 2006 (inproceedings)

Abstract
We propose a hierarchical process for inferring the 3D pose of a person from monocular images. First we infer a learned view-based 2D body model from a single image using non-parametric belief propagation. This approach integrates information from bottom-up body-part proposal processes and deals with self-occlusion to compute distributions over limb poses. Then, we exploit a learned Mixture of Experts model to infer a distribution of 3D poses conditioned on 2D poses. This approach is more general than recent work on inferring 3D pose directly from silhouettes since the 2D body model provides a richer representation that includes the 2D joint angles and the poses of limbs that may be unobserved in the silhouette. We demonstrate the method in a laboratory setting where we evaluate the accuracy of the 3D poses against ground truth data. We also estimate 3D body pose in a monocular image sequence. The resulting 3D estimates are sufficiently accurate to serve as proposals for the Bayesian inference of 3D human motion over time

pdf pdf from publisher Video [BibTex]

pdf pdf from publisher Video [BibTex]


Specular flow and the recovery of surface structure
Specular flow and the recovery of surface structure

Roth, S., Black, M.

In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2, pages: 1869-1876, New York, NY, June 2006 (inproceedings)

Abstract
In scenes containing specular objects, the image motion observed by a moving camera may be an intermixed combination of optical flow resulting from diffuse reflectance (diffuse flow) and specular reflection (specular flow). Here, with few assumptions, we formalize the notion of specular flow, show how it relates to the 3D structure of the world, and develop an algorithm for estimating scene structure from 2D image motion. Unlike previous work on isolated specular highlights we use two image frames and estimate the semi-dense flow arising from the specular reflections of textured scenes. We parametrically model the image motion of a quadratic surface patch viewed from a moving camera. The flow is modeled as a probabilistic mixture of diffuse and specular components and the 3D shape is recovered using an Expectation-Maximization algorithm. Rather than treating specular reflections as noise to be removed or ignored, we show that the specular flow provides additional constraints on scene geometry that improve estimation of 3D structure when compared with reconstruction from diffuse flow alone. We demonstrate this for a set of synthetic and real sequences of mixed specular-diffuse objects.

pdf [BibTex]

pdf [BibTex]


An adaptive appearance model approach for model-based articulated object tracking
An adaptive appearance model approach for model-based articulated object tracking

Balan, A., Black, M. J.

In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 1, pages: 758-765, New York, NY, June 2006 (inproceedings)

Abstract
The detection and tracking of three-dimensional human body models has progressed rapidly but successful approaches typically rely on accurate foreground silhouettes obtained using background segmentation. There are many practical applications where such information is imprecise. Here we develop a new image likelihood function based on the visual appearance of the subject being tracked. We propose a robust, adaptive, appearance model based on the Wandering-Stable-Lost framework extended to the case of articulated body parts. The method models appearance using a mixture model that includes an adaptive template, frame-to-frame matching and an outlier process. We employ an annealed particle filtering algorithm for inference and take advantage of the 3D body model to predict self occlusion and improve pose estimation accuracy. Quantitative tracking results are presented for a walking sequence with a 180 degree turn, captured with four synchronized and calibrated cameras and containing significant appearance changes and self-occlusion in each view.

pdf [BibTex]

pdf [BibTex]


Measure locally, reason globally: Occlusion-sensitive articulated pose estimation
Measure locally, reason globally: Occlusion-sensitive articulated pose estimation

Sigal, L., Black, M. J.

In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR, 2, pages: 2041-2048, New York, NY, June 2006 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Statistical analysis of the non-stationarity of neural population codes
Statistical analysis of the non-stationarity of neural population codes

Kim, S., Wood, F., Fellows, M., Donoghue, J. P., Black, M. J.

In BioRob 2006, The first IEEE / RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, pages: 295-299, Pisa, Italy, Febuary 2006 (inproceedings)

pdf [BibTex]

pdf [BibTex]


no image
How to choose the covariance for Gaussian process regression independently of the basis

Franz, M., Gehler, P.

In Proceedings of the Workshop Gaussian Processes in Practice, 2006 (inproceedings)

pdf [BibTex]

pdf [BibTex]


The rate adapting poisson model for information retrieval and object recognition
The rate adapting poisson model for information retrieval and object recognition

Gehler, P. V., Holub, A. D., Welling, M.

In Proceedings of the 23rd international conference on Machine learning, pages: 337-344, ICML ’06, ACM, New York, NY, USA, 2006 (inproceedings)

project page pdf DOI [BibTex]

project page pdf DOI [BibTex]


Implicit Wiener Series, Part II: Regularised estimation
Implicit Wiener Series, Part II: Regularised estimation

Gehler, P., Franz, M.

(148), Max Planck Institute, 2006 (techreport)

pdf [BibTex]


Tracking complex objects using graphical object models
Tracking complex objects using graphical object models

Sigal, L., Zhu, Y., Comaniciu, D., Black, M. J.

In International Workshop on Complex Motion, LNCS 3417, pages: 223-234, Springer-Verlag, 2006 (inproceedings)

pdf pdf from publisher [BibTex]

pdf pdf from publisher [BibTex]


{HumanEva}: Synchronized video and motion capture dataset for evaluation of articulated human motion
HumanEva: Synchronized video and motion capture dataset for evaluation of articulated human motion

Sigal, L., Black, M. J.

(CS-06-08), Brown University, Department of Computer Science, 2006 (techreport)

pdf abstract [BibTex]

pdf abstract [BibTex]


Bayesian population decoding of motor cortical activity using a {Kalman} filter
Bayesian population decoding of motor cortical activity using a Kalman filter

Wu, W., Gao, Y., Bienenstock, E., Donoghue, J. P., Black, M. J.

Neural Computation, 18(1):80-118, 2006 (article)

Abstract
Effective neural motor prostheses require a method for decoding neural activity representing desired movement. In particular, the accurate reconstruction of a continuous motion signal is necessary for the control of devices such as computer cursors, robots, or a patient's own paralyzed limbs. For such applications, we developed a real-time system that uses Bayesian inference techniques to estimate hand motion from the firing rates of multiple neurons. In this study, we used recordings that were previously made in the arm area of primary motor cortex in awake behaving monkeys using a chronically implanted multielectrode microarray. Bayesian inference involves computing the posterior probability of the hand motion conditioned on a sequence of observed firing rates; this is formulated in terms of the product of a likelihood and a prior. The likelihood term models the probability of firing rates given a particular hand motion. We found that a linear gaussian model could be used to approximate this likelihood and could be readily learned from a small amount of training data. The prior term defines a probabilistic model of hand kinematics and was also taken to be a linear gaussian model. Decoding was performed using a Kalman filter, which gives an efficient recursive method for Bayesian inference when the likelihood and prior are linear and gaussian. In off-line experiments, the Kalman filter reconstructions of hand trajectory were more accurate than previously reported results. The resulting decoding algorithm provides a principled probabilistic model of motor-cortical coding, decodes hand motion in real time, provides an estimate of uncertainty, and is straightforward to implement. Additionally the formulation unifies and extends previous models of neural coding while providing insights into the motor-cortical code.

pdf preprint pdf from publisher abstract [BibTex]

pdf preprint pdf from publisher abstract [BibTex]


Hierarchical Approach for Articulated {3D} Pose-Estimation and Tracking (extended abstract)
Hierarchical Approach for Articulated 3D Pose-Estimation and Tracking (extended abstract)

Sigal, L., Black, M. J.

In Learning, Representation and Context for Human Sensing in Video Workshop (in conjunction with CVPR), 2006 (inproceedings)

pdf poster [BibTex]

pdf poster [BibTex]


Nonlinear physically-based models for decoding motor-cortical population activity
Nonlinear physically-based models for decoding motor-cortical population activity

Shakhnarovich, G., Kim, S., Black, M. J.

In Advances in Neural Information Processing Systems 19, NIPS-2006, pages: 1257-1264, MIT Press, 2006 (inproceedings)

pdf [BibTex]

pdf [BibTex]


no image
A comparison of decoding models for imagined motion from human motor cortex

Kim, S., Simeral, J., Donoghue, J. P., Hocherberg, L. R., Friehs, G., Mukand, J. A., Chen, D., Black, M. J.

Program No. 256.11. 2006 Abstract Viewer and Itinerary Planner, Society for Neuroscience, Atlanta, GA, 2006, Online (conference)

[BibTex]

[BibTex]


Denoising archival films using a learned {Bayesian} model
Denoising archival films using a learned Bayesian model

Moldovan, T. M., Roth, S., Black, M. J.

In Int. Conf. on Image Processing, ICIP, pages: 2641-2644, Atlanta, 2006 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Efficient belief propagation with learned higher-order {Markov} random fields
Efficient belief propagation with learned higher-order Markov random fields

Lan, X., Roth, S., Huttenlocher, D., Black, M. J.

In European Conference on Computer Vision, ECCV, II, pages: 269-282, Graz, Austria, 2006 (inproceedings)

pdf pdf from publisher [BibTex]

pdf pdf from publisher [BibTex]


Products of ``Edge-perts''
Products of “Edge-perts”

Gehler, P., Welling, M.

In Advances in Neural Information Processing Systems 18, pages: 419-426, (Editors: Weiss, Y. and Sch"olkopf, B. and Platt, J.), MIT Press, Cambridge, MA, 2006 (incollection)

pdf [BibTex]

pdf [BibTex]


no image
Modeling neural control of physically realistic movement

Shaknarovich, G., Kim, S., Donoghue, J. P., Hocherberg, L. R., Friehs, G., Mukand, J. A., Chen, D., Black, M. J.

Program No. 256.12. 2006 Abstract Viewer and Itinerary Planner, Society for Neuroscience, Atlanta, GA, 2006, Online (conference)

[BibTex]

[BibTex]

2001


Dynamic coupled component analysis
Dynamic coupled component analysis

De la Torre, F., Black, M. J.

In IEEE Proc. Computer Vision and Pattern Recognition, CVPR’01, 2, pages: 643-650, IEEE, Kauai, Hawaii, December 2001 (inproceedings)

pdf [BibTex]

2001

pdf [BibTex]


Robust principal component analysis for computer vision
Robust principal component analysis for computer vision

De la Torre, F., Black, M. J.

In Int. Conf. on Computer Vision, ICCV-2001, II, pages: 362-369, Vancouver, BC, USA, 2001 (inproceedings)

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Learning image statistics for {Bayesian} tracking
Learning image statistics for Bayesian tracking

Sidenbladh, H., Black, M. J.

In Int. Conf. on Computer Vision, ICCV-2001, II, pages: 709-716, Vancouver, BC, USA, 2001 (inproceedings)

pdf [BibTex]

pdf [BibTex]


no image
Encoding/decoding of arm kinematics from simultaneously recorded MI neurons

Gao, Y., Bienenstock, E., Black, M., Shoham, S., Serruya, M., Donoghue, J.

Society for Neuroscience Abst. Vol. 27, Program No. 572.14, 2001 (conference)

abstract [BibTex]

abstract [BibTex]


Learning and tracking cyclic human motion
Learning and tracking cyclic human motion

Ormoneit, D., Sidenbladh, H., Black, M. J., Hastie, T.

In Advances in Neural Information Processing Systems 13, NIPS, pages: 894-900, (Editors: Leen, Todd K. and Dietterich, Thomas G. and Tresp, Volker), The MIT Press, 2001 (inproceedings)

pdf [BibTex]

pdf [BibTex]

1999


Edges as outliers: Anisotropic smoothing using local image statistics
Edges as outliers: Anisotropic smoothing using local image statistics

Black, M. J., Sapiro, G.

In Scale-Space Theories in Computer Vision, Second Int. Conf., Scale-Space ’99, pages: 259-270, LNCS 1682, Springer, Corfu, Greece, September 1999 (inproceedings)

Abstract
Edges are viewed as statistical outliers with respect to local image gradient magnitudes. Within local image regions we compute a robust statistical measure of the gradient variation and use this in an anisotropic diffusion framework to determine a spatially varying "edge-stopping" parameter σ. We show how to determine this parameter for two edge-stopping functions described in the literature (Perona-Malik and the Tukey biweight). Smoothing of the image is related the local texture and in regions of low texture, small gradient values may be treated as edges whereas in regions of high texture, large gradient magnitudes are necessary before an edge is preserved. Intuitively these results have similarities with human perceptual phenomena such as masking and "popout". Results are shown on a variety of standard images.

pdf [BibTex]

1999

pdf [BibTex]


Probabilistic detection and tracking of motion discontinuities
Probabilistic detection and tracking of motion discontinuities

(Marr Prize, Honorable Mention)

Black, M. J., Fleet, D. J.

In Int. Conf. on Computer Vision, ICCV-99, pages: 551-558, ICCV, Corfu, Greece, September 1999 (inproceedings)

pdf [BibTex]

pdf [BibTex]


Artscience Sciencart
Artscience Sciencart

Black, M. J., Levy, D., PamelaZ,

In Art and Innovation: The Xerox PARC Artist-in-Residence Program, pages: 244-300, (Editors: Harris, C.), MIT-Press, 1999 (incollection)

Abstract
One of the effects of the PARC Artist In Residence (PAIR) program has been to expose the strong connections between scientists and artists. Both do what they do because they need to do it. They are often called upon to justify their work in order to be allowed to continue to do it. They need to justify it to funders, to sponsoring institutions, corporations, the government, the public. They publish papers, teach workshops, and write grants touting the educational or health benefits of what they do. All of these things are to some extent valid, but the fact of the matter is: artists and scientists do their work because they are driven to do it. They need to explore and create.

This chapter attempts to give a flavor of one multi-way "PAIRing" between performance artist PamelaZ and two PARC researchers, Michael Black and David Levy. The three of us paired up because we found each other interesting. We chose each other. While most artists in the program are paired with a single researcher Pamela jokingly calls herself a bigamist for choosing two PAIR "husbands" with different backgrounds and interests.

There are no "rules" to the PAIR program; no one told us what to do with our time. Despite this we all had a sense that we needed to produce something tangible during Pamela's year-long residency. In fact, Pamela kept extending her residency because she did not feel as though we had actually made anything concrete. The interesting thing was that all along we were having great conversations, some of which Pamela recorded. What we did not see at the time was that it was these conversations between artists and scientists that are at the heart of the PAIR program and that these conversations were changing the way we thought about our own work and the relationships between science and art.

To give these conversations their due, and to allow the reader into our PAIR interactions, we include two of our many conversations in this chapter.

[BibTex]

[BibTex]


Parameterized modeling and recognition of activities
Parameterized modeling and recognition of activities

Yacoob, Y., Black, M. J.

Computer Vision and Image Understanding, 73(2):232-247, 1999 (article)

Abstract
In this paper we consider a class of human activities—atomic activities—which can be represented as a set of measurements over a finite temporal window (e.g., the motion of human body parts during a walking cycle) and which has a relatively small space of variations in performance. A new approach for modeling and recognition of atomic activities that employs principal component analysis and analytical global transformations is proposed. The modeling of sets of exemplar instances of activities that are similar in duration and involve similar body part motions is achieved by parameterizing their representation using principal component analysis. The recognition of variants of modeled activities is achieved by searching the space of admissible parameterized transformations that these activities can undergo. This formulation iteratively refines the recognition of the class to which the observed activity belongs and the transformation parameters that relate it to the model in its class. We provide several experiments on recognition of articulated and deformable human motions from image motion parameters.

pdf pdf from publisher DOI [BibTex]

pdf pdf from publisher DOI [BibTex]


Explaining optical flow events with parameterized spatio-temporal models
Explaining optical flow events with parameterized spatio-temporal models

Black, M. J.

In IEEE Proc. Computer Vision and Pattern Recognition, CVPR’99, pages: 326-332, IEEE, Fort Collins, CO, 1999 (inproceedings)

pdf video [BibTex]

pdf video [BibTex]

1998


Summarization of video-taped presentations: Automatic analysis of motion and gesture
Summarization of video-taped presentations: Automatic analysis of motion and gesture

Ju, S. X., Black, M. J., Minneman, S., Kimber, D.

IEEE Trans. on Circuits and Systems for Video Technology, 8(5):686-696, September 1998 (article)

Abstract
This paper presents an automatic system for analyzing and annotating video sequences of technical talks. Our method uses a robust motion estimation technique to detect key frames and segment the video sequence into subsequences containing a single overhead slide. The subsequences are stabilized to remove motion that occurs when the speaker adjusts their slides. Any changes remaining between frames in the stabilized sequences may be due to speaker gestures such as pointing or writing, and we use active contours to automatically track these potential gestures. Given the constrained domain, we define a simple set of actions that can be recognized based on the active contour shape and motion. The recognized actions provide an annotation of the sequence that can be used to access a condensed version of the talk from a Web page.

pdf pdf from publisher DOI [BibTex]

1998

pdf pdf from publisher DOI [BibTex]


Robust anisotropic diffusion
Robust anisotropic diffusion

Black, M. J., Sapiro, G., Marimont, D., Heeger, D.

IEEE Transactions on Image Processing, 7(3):421-432, March 1998 (article)

Abstract
Relations between anisotropic diffusion and robust statistics are described in this paper. Specifically, we show that anisotropic diffusion can be seen as a robust estimation procedure that estimates a piecewise smooth image from a noisy input image. The edge-stopping; function in the anisotropic diffusion equation is closely related to the error norm and influence function in the robust estimation framework. This connection leads to a new edge-stopping; function based on Tukey's biweight robust estimator that preserves sharper boundaries than previous formulations and improves the automatic stopping of the diffusion. The robust statistical interpretation also provides a means for detecting the boundaries (edges) between the piecewise smooth regions in an image that has been smoothed with anisotropic diffusion. Additionally, we derive a relationship between anisotropic diffusion and regularization with line processes. Adding constraints on the spatial organization of the line processes allows us to develop new anisotropic diffusion equations that result in a qualitative improvement in the continuity of edges

pdf pdf from publisher [BibTex]

pdf pdf from publisher [BibTex]


The Digital Office: Overview
The Digital Office: Overview

Black, M., Berard, F., Jepson, A., Newman, W., Saund, E., Socher, G., Taylor, M.

In AAAI Spring Symposium on Intelligent Environments, pages: 1-6, Stanford, March 1998 (inproceedings)

pdf [BibTex]

pdf [BibTex]


A framework for modeling appearance change in image sequences
A framework for modeling appearance change in image sequences

Black, M. J., Fleet, D. J., Yacoob, Y.

In Sixth International Conf. on Computer Vision, ICCV’98, pages: 660-667, Mumbai, India, January 1998 (inproceedings)

Abstract
Image "appearance" may change over time due to a variety of causes such as 1) object or camera motion; 2) generic photometric events including variations in illumination (e.g. shadows) and specular reflections; and 3) "iconic changes" which are specific to the objects being viewed and include complex occlusion events and changes in the material properties of the objects. We propose a general framework for representing and recovering these "appearance changes" in an image sequence as a "mixture" of different causes. The approach generalizes previous work on optical flow to provide a richer description of image events and more reliable estimates of image motion.

pdf video [BibTex]

pdf video [BibTex]