Header logo is ps


2019


Thumb xl multihumanoflow thumb
Learning Multi-Human Optical Flow

Ranjan, A., Hoffmann, D. T., Tzionas, D., Tang, S., Romero, J., Black, M. J.

International Journal of Computer Vision (IJCV), December 2019 (article)

Abstract
The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single-and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on held-out test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research.

Paper poster link (url) DOI [BibTex]

2019


Thumb xl celia
Decoding subcategories of human bodies from both body- and face-responsive cortical regions

Foster, C., Zhao, M., Romero, J., Black, M. J., Mohler, B. J., Bartels, A., Bülthoff, I.

NeuroImage, 202(15):116085, November 2019 (article)

Abstract
Our visual system can easily categorize objects (e.g. faces vs. bodies) and further differentiate them into subcategories (e.g. male vs. female). This ability is particularly important for objects of social significance, such as human faces and bodies. While many studies have demonstrated category selectivity to faces and bodies in the brain, how subcategories of faces and bodies are represented remains unclear. Here, we investigated how the brain encodes two prominent subcategories shared by both faces and bodies, sex and weight, and whether neural responses to these subcategories rely on low-level visual, high-level visual or semantic similarity. We recorded brain activity with fMRI while participants viewed faces and bodies that varied in sex, weight, and image size. The results showed that the sex of bodies can be decoded from both body- and face-responsive brain areas, with the former exhibiting more consistent size-invariant decoding than the latter. Body weight could also be decoded in face-responsive areas and in distributed body-responsive areas, and this decoding was also invariant to image size. The weight of faces could be decoded from the fusiform body area (FBA), and weight could be decoded across face and body stimuli in the extrastriate body area (EBA) and a distributed body-responsive area. The sex of well-controlled faces (e.g. excluding hairstyles) could not be decoded from face- or body-responsive regions. These results demonstrate that both face- and body-responsive brain regions encode information that can distinguish the sex and weight of bodies. Moreover, the neural patterns corresponding to sex and weight were invariant to image size and could sometimes generalize across face and body stimuli, suggesting that such subcategorical information is encoded with a high-level visual or semantic code.

paper pdf DOI [BibTex]

paper pdf DOI [BibTex]


Thumb xl autonomous mocap cover image new
Active Perception based Formation Control for Multiple Aerial Vehicles

Tallamraju, R., Price, E., Ludwig, R., Karlapalem, K., Bülthoff, H. H., Black, M. J., Ahmad, A.

IEEE Robotics and Automation Letters, Robotics and Automation Letters, 4(4):4491-4498, IEEE, October 2019 (article)

Abstract
We present a novel robotic front-end for autonomous aerial motion-capture (mocap) in outdoor environments. In previous work, we presented an approach for cooperative detection and tracking (CDT) of a subject using multiple micro-aerial vehicles (MAVs). However, it did not ensure optimal view-point configurations of the MAVs to minimize the uncertainty in the person's cooperatively tracked 3D position estimate. In this article, we introduce an active approach for CDT. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs can actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We achieve this by decoupling the goal of active tracking into a quadratic objective and non-convex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We derive this decoupling using Gaussian observation model assumptions within the CDT algorithm. We preserve convexity in optimization by embedding all the non-convex constraints, including those for dynamic obstacle avoidance, as external control inputs in the MPC dynamics. Multiple real robot experiments and comparisons involving 3 MAVs in several challenging scenarios are presented.

pdf DOI Project Page [BibTex]

pdf DOI Project Page [BibTex]


Thumb xl 3dmm
3D Morphable Face Models - Past, Present and Future

Egger, B., Smith, W. A. P., Tewari, A., Wuhrer, S., Zollhoefer, M., Beeler, T., Bernard, F., Bolkart, T., Kortylewski, A., Romdhani, S., Theobalt, C., Blanz, V., Vetter, T.

arxiv preprint arXiv:1909.01815, September 2019 (article)

Abstract
In this paper, we provide a detailed survey of 3D Morphable Face Models over the 20 years since they were first proposed. The challenges in building and applying these models, namely capture, modeling, image formation,and image analysis, are still active research topics, and we review the state-of-the-art in each of these areas. We also look ahead, identifying unsolved challenges, proposing directions for future research and highlighting the broad range of current and future applications.

paper project page [BibTex]

paper project page [BibTex]


Thumb xl hessepami
Learning and Tracking the 3D Body Shape of Freely Moving Infants from RGB-D sequences

Hesse, N., Pujades, S., Black, M., Arens, M., Hofmann, U., Schroeder, S.

Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019 (article)

Abstract
Statistical models of the human body surface are generally learned from thousands of high-quality 3D scans in predefined poses to cover the wide variety of human body shapes and articulations. Acquisition of such data requires expensive equipment, calibration procedures, and is limited to cooperative subjects who can understand and follow instructions, such as adults. We present a method for learning a statistical 3D Skinned Multi-Infant Linear body model (SMIL) from incomplete, low-quality RGB-D sequences of freely moving infants. Quantitative experiments show that SMIL faithfully represents the RGB-D data and properly factorizes the shape and pose of the infants. To demonstrate the applicability of SMIL, we fit the model to RGB-D sequences of freely moving infants and show, with a case study, that our method captures enough motion detail for General Movements Assessment (GMA), a method used in clinical practice for early detection of neurodevelopmental disorders in infants. SMIL provides a new tool for analyzing infant shape and movement and is a step towards an automated system for GMA.

pdf Journal DOI [BibTex]

pdf Journal DOI [BibTex]


Thumb xl kenny
Perceptual Effects of Inconsistency in Human Animations

Kenny, S., Mahmood, N., Honda, C., Black, M. J., Troje, N. F.

ACM Trans. Appl. Percept., 16(1):2:1-2:18, Febuary 2019 (article)

Abstract
The individual shape of the human body, including the geometry of its articulated structure and the distribution of weight over that structure, influences the kinematics of a person’s movements. How sensitive is the visual system to inconsistencies between shape and motion introduced by retargeting motion from one person onto the shape of another? We used optical motion capture to record five pairs of male performers with large differences in body weight, while they pushed, lifted, and threw objects. From these data, we estimated both the kinematics of the actions as well as the performer’s individual body shape. To obtain consistent and inconsistent stimuli, we created animated avatars by combining the shape and motion estimates from either a single performer or from different performers. Using these stimuli we conducted three experiments in an immersive virtual reality environment. First, a group of participants detected which of two stimuli was inconsistent. Performance was very low, and results were only marginally significant. Next, a second group of participants rated perceived attractiveness, eeriness, and humanness of consistent and inconsistent stimuli, but these judgements of animation characteristics were not affected by consistency of the stimuli. Finally, a third group of participants rated properties of the objects rather than of the performers. Here, we found strong influences of shape-motion inconsistency on perceived weight and thrown distance of objects. This suggests that the visual system relies on its knowledge of shape and motion and that these components are assimilated into an altered perception of the action outcome. We propose that the visual system attempts to resist inconsistent interpretations of human animations. Actions involving object manipulations present an opportunity for the visual system to reinterpret the introduced inconsistencies as a change in the dynamics of an object rather than as an unexpected combination of body shape and body motion.

publisher pdf DOI [BibTex]

publisher pdf DOI [BibTex]


Thumb xl virtualcaliper
The Virtual Caliper: Rapid Creation of Metrically Accurate Avatars from 3D Measurements

Pujades, S., Mohler, B., Thaler, A., Tesch, J., Mahmood, N., Hesse, N., Bülthoff, H. H., Black, M. J.

IEEE Transactions on Visualization and Computer Graphics, 25, pages: 1887,1897, IEEE, 2019 (article)

Abstract
Creating metrically accurate avatars is important for many applications such as virtual clothing try-on, ergonomics, medicine, immersive social media, telepresence, and gaming. Creating avatars that precisely represent a particular individual is challenging however, due to the need for expensive 3D scanners, privacy issues with photographs or videos, and difficulty in making accurate tailoring measurements. We overcome these challenges by creating “The Virtual Caliper”, which uses VR game controllers to make simple measurements. First, we establish what body measurements users can reliably make on their own body. We find several distance measurements to be good candidates and then verify that these are linearly related to 3D body shape as represented by the SMPL body model. The Virtual Caliper enables novice users to accurately measure themselves and create an avatar with their own body shape. We evaluate the metric accuracy relative to ground truth 3D body scan data, compare the method quantitatively to other avatar creation tools, and perform extensive perceptual studies. We also provide a software application to the community that enables novices to rapidly create avatars in fewer than five minutes. Not only is our approach more rapid than existing methods, it exports a metrically accurate 3D avatar model that is rigged and skinned.

Project Page IEEE Open Access IEEE Open Access PDF DOI [BibTex]

Project Page IEEE Open Access IEEE Open Access PDF DOI [BibTex]

2015


Thumb xl grassmanteaser
Scalable Robust Principal Component Analysis using Grassmann Averages

Hauberg, S., Feragen, A., Enficiaud, R., Black, M.

IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), December 2015 (article)

Abstract
In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average (GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average (TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.

preprint pdf from publisher supplemental Project Page [BibTex]

2015


Thumb xl splitbodieswebteaser2
SMPL: A Skinned Multi-Person Linear Model

Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M. J.

ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1-248:16, ACM, New York, NY, October 2015 (article)

Abstract
We present a learned model of human body shape and pose-dependent shape variation that is more accurate than previous models and is compatible with existing graphics pipelines. Our Skinned Multi-Person Linear model (SMPL) is a skinned vertex-based model that accurately represents a wide variety of body shapes in natural human poses. The parameters of the model are learned from data including the rest pose template, blend weights, pose-dependent blend shapes, identity-dependent blend shapes, and a regressor from vertices to joint locations. Unlike previous models, the pose-dependent blend shapes are a linear function of the elements of the pose rotation matrices. This simple formulation enables training the entire model from a relatively large number of aligned 3D meshes of different people in different poses. We quantitatively evaluate variants of SMPL using linear or dual-quaternion blend skinning and show that both are more accurate than a Blend-SCAPE model trained on the same data. We also extend SMPL to realistically model dynamic soft-tissue deformations. Because it is based on blend skinning, SMPL is compatible with existing rendering engines and we make it available for research purposes.

pdf video code/model errata DOI Project Page Project Page [BibTex]

pdf video code/model errata DOI Project Page Project Page [BibTex]


Thumb xl dynateaser
Dyna: A Model of Dynamic Human Shape in Motion

Pons-Moll, G., Romero, J., Mahmood, N., Black, M. J.

ACM Transactions on Graphics, (Proc. SIGGRAPH), 34(4):120:1-120:14, ACM, August 2015 (article)

Abstract
To look human, digital full-body avatars need to have soft tissue deformations like those of real people. We learn a model of soft-tissue deformations from examples using a high-resolution 4D capture system and a method that accurately registers a template mesh to sequences of 3D scans. Using over 40,000 scans of ten subjects, we learn how soft tissue motion causes mesh triangles to deform relative to a base 3D body model. Our Dyna model uses a low-dimensional linear subspace to approximate soft-tissue deformation and relates the subspace coefficients to the changing pose of the body. Dyna uses a second-order auto-regressive model that predicts soft-tissue deformations based on previous deformations, the velocity and acceleration of the body, and the angular velocities and accelerations of the limbs. Dyna also models how deformations vary with a person’s body mass index (BMI), producing different deformations for people with different shapes. Dyna realistically represents the dynamics of soft tissue for previously unseen subjects and motions. We provide tools for animators to modify the deformations and apply them to new stylized characters.

pdf preprint video data DOI Project Page Project Page [BibTex]

pdf preprint video data DOI Project Page Project Page [BibTex]


Thumb xl objs2acts
Linking Objects to Actions: Encoding of Target Object and Grasping Strategy in Primate Ventral Premotor Cortex

Vargas-Irwin, C. E., Franquemont, L., Black, M. J., Donoghue, J. P.

Journal of Neuroscience, 35(30):10888-10897, July 2015 (article)

Abstract
Neural activity in ventral premotor cortex (PMv) has been associated with the process of matching perceived objects with the motor commands needed to grasp them. It remains unclear how PMv networks can flexibly link percepts of objects affording multiple grasp options into a final desired hand action. Here, we use a relational encoding approach to track the functional state of PMv neuronal ensembles in macaque monkeys through the process of passive viewing, grip planning, and grasping movement execution. We used objects affording multiple possible grip strategies. The task included separate instructed delay periods for object presentation and grip instruction. This approach allowed us to distinguish responses elicited by the visual presentation of the objects from those associated with selecting a given motor plan for grasping. We show that PMv continuously incorporates information related to object shape and grip strategy as it becomes available, revealing a transition from a set of ensemble states initially most closely related to objects, to a new set of ensemble patterns reflecting unique object-grip combinations. These results suggest that PMv dynamically combines percepts, gradually navigating toward activity patterns associated with specific volitional actions, rather than directly mapping perceptual object properties onto categorical grip representations. Our results support the idea that PMv is part of a network that dynamically computes motor plans from perceptual information. Significance Statement: The present work demonstrates that the activity of groups of neurons in primate ventral premotor cortex reflects information related to visually presented objects, as well as the motor strategy used to grasp them, linking individual objects to multiple possible grips. PMv could provide useful control signals for neuroprosthetic assistive devices designed to interact with objects in a flexible way.

publisher link DOI Project Page [BibTex]

publisher link DOI Project Page [BibTex]


Thumb xl silvia phd
Shape Models of the Human Body for Distributed Inference

Zuffi, S.

Brown University, May 2015 (phdthesis)

Abstract
In this thesis we address the problem of building shape models of the human body, in 2D and 3D, which are realistic and efficient to use. We focus our efforts on the human body, which is highly articulated and has interesting shape variations, but the approaches we present here can be applied to generic deformable and articulated objects. To address efficiency, we constrain our models to be part-based and have a tree-structured representation with pairwise relationships between connected parts. This allows the application of methods for distributed inference based on message passing. To address realism, we exploit recent advances in computer graphics that represent the human body with statistical shape models learned from 3D scans. We introduce two articulated body models, a 2D model, named Deformable Structures (DS), which is a contour-based model parameterized for 2D pose and projected shape, and a 3D model, named Stitchable Puppet (SP), which is a mesh-based model parameterized for 3D pose, pose-dependent deformations and intrinsic body shape. We have successfully applied the models to interesting and challenging problems in computer vision and computer graphics, namely pose estimation from static images, pose estimation from video sequences, pose and shape estimation from 3D scan data. This advances the state of the art in human pose and shape estimation and suggests that carefully de ned realistic models can be important for computer vision. More work at the intersection of vision and graphics is thus encouraged.

PDF [BibTex]


Thumb xl screen shot 2015 10 14 at 08.57.57
Multi-view and 3D Deformable Part Models

Pepik, B., Stark, M., Gehler, P., Schiele, B.

Pattern Analysis and Machine Intelligence, 37(11):14, IEEE, March 2015 (article)

Abstract
As objects are inherently 3-dimensional, they have been modeled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 3D object representations have been neglected and 2D feature-based models are the predominant paradigm in object detection nowadays. While such models have achieved outstanding bounding box detection performance, they come with limited expressiveness, as they are clearly limited in their capability of reasoning about 3D shape or viewpoints. In this work, we bring the worlds of 3D and 2D object representations closer, by building an object detector which leverages the expressive power of 3D object representations while at the same time can be robustly matched to image evidence. To that end, we gradually extend the successful deformable part model [1] to include viewpoint information and part-level 3D geometry information, resulting in several different models with different level of expressiveness. We end up with a 3D object model, consisting of multiple object parts represented in 3D and a continuous appearance model. We experimentally verify that our models, while providing richer object hypotheses than the 2D object models, provide consistently better joint object localization and viewpoint estimation than the state-of-the-art multi-view and 3D object detectors on various benchmarks (KITTI [2], 3D object classes [3], Pascal3D+ [4], Pascal VOC 2007 [5], EPFL multi-view cars [6]).

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Thumb xl th teaser
From Scans to Models: Registration of 3D Human Shapes Exploiting Texture Information

Bogo, F.

University of Padova, March 2015 (phdthesis)

Abstract
New scanning technologies are increasing the importance of 3D mesh data, and of algorithms that can reliably register meshes obtained from multiple scans. Surface registration is important e.g. for building full 3D models from partial scans, identifying and tracking objects in a 3D scene, creating statistical shape models. Human body registration is particularly important for many applications, ranging from biomedicine and robotics to the production of movies and video games; but obtaining accurate and reliable registrations is challenging, given the articulated, non-rigidly deformable structure of the human body. In this thesis, we tackle the problem of 3D human body registration. We start by analyzing the current state of the art, and find that: a) most registration techniques rely only on geometric information, which is ambiguous on flat surface areas; b) there is a lack of adequate datasets and benchmarks in the field. We address both issues. Our contribution is threefold. First, we present a model-based registration technique for human meshes that combines geometry and surface texture information to provide highly accurate mesh-to-mesh correspondences. Our approach estimates scene lighting and surface albedo, and uses the albedo to construct a high-resolution textured 3D body model that is brought into registration with multi-camera image data using a robust matching term. Second, by leveraging our technique, we present FAUST (Fine Alignment Using Scan Texture), a novel dataset collecting 300 high-resolution scans of 10 people in a wide range of poses. FAUST is the first dataset providing both real scans and automatically computed, reliable "ground-truth" correspondences between them. Third, we explore possible uses of our approach in dermatology. By combining our registration technique with a melanocytic lesion segmentation algorithm, we propose a system that automatically detects new or evolving lesions over almost the entire body surface, thus helping dermatologists identify potential melanomas. We conclude this thesis investigating the benefits of using texture information to establish frame-to-frame correspondences in dynamic monocular sequences captured with consumer depth cameras. We outline a novel approach to reconstruct realistic body shape and appearance models from dynamic human performances, and show preliminary results on challenging sequences captured with a Kinect.

[BibTex]


Thumb xl thesis teaser
Long Range Motion Estimation and Applications

Sevilla-Lara, L.

Long Range Motion Estimation and Applications, University of Massachusetts Amherst, University of Massachusetts Amherst, Febuary 2015 (phdthesis)

Abstract
Finding correspondences between images underlies many computer vision problems, such as optical flow, tracking, stereovision and alignment. Finding these correspondences involves formulating a matching function and optimizing it. This optimization process is often gradient descent, which avoids exhaustive search, but relies on the assumption of being in the basin of attraction of the right local minimum. This is often the case when the displacement is small, and current methods obtain very accurate results for small motions. However, when the motion is large and the matching function is bumpy this assumption is less likely to be true. One traditional way of avoiding this abruptness is to smooth the matching function spatially by blurring the images. As the displacement becomes larger, the amount of blur required to smooth the matching function becomes also larger. This averaging of pixels leads to a loss of detail in the image. Therefore, there is a trade-off between the size of the objects that can be tracked and the displacement that can be captured. In this thesis we address the basic problem of increasing the size of the basin of attraction in a matching function. We use an image descriptor called distribution fields (DFs). By blurring the images in DF space instead of in pixel space, we in- crease the size of the basin attraction with respect to traditional methods. We show competitive results using DFs both in object tracking and optical flow. Finally we demonstrate an application of capturing large motions for temporal video stitching.

[BibTex]

[BibTex]


Thumb xl ssimssmall
Spike train SIMilarity Space (SSIMS): A framework for single neuron and ensemble data analysis

Vargas-Irwin, C. E., Brandman, D. M., Zimmermann, J. B., Donoghue, J. P., Black, M. J.

Neural Computation, 27(1):1-31, MIT Press, January 2015 (article)

Abstract
We present a method to evaluate the relative similarity of neural spiking patterns by combining spike train distance metrics with dimensionality reduction. Spike train distance metrics provide an estimate of similarity between activity patterns at multiple temporal resolutions. Vectors of pair-wise distances are used to represent the intrinsic relationships between multiple activity patterns at the level of single units or neuronal ensembles. Dimensionality reduction is then used to project the data into concise representations suitable for clustering analysis as well as exploratory visualization. Algorithm performance and robustness are evaluated using multielectrode ensemble activity data recorded in behaving primates. We demonstrate how Spike train SIMilarity Space (SSIMS) analysis captures the relationship between goal directions for an 8-directional reaching task and successfully segregates grasp types in a 3D grasping task in the absence of kinematic information. The algorithm enables exploration of virtually any type of neural spiking (time series) data, providing similarity-based clustering of neural activity states with minimal assumptions about potential information encoding models.

pdf: publisher site pdf: author's proof DOI Project Page [BibTex]

pdf: publisher site pdf: author's proof DOI Project Page [BibTex]


Thumb xl thumb teaser mrg
Metric Regression Forests for Correspondence Estimation

Pons-Moll, G., Taylor, J., Shotton, J., Hertzmann, A., Fitzgibbon, A.

International Journal of Computer Vision, pages: 1-13, 2015 (article)

springer PDF Project Page [BibTex]

springer PDF Project Page [BibTex]


Thumb xl fotorobos
Formation control driven by cooperative object tracking

Lima, P., Ahmad, A., Dias, A., Conceição, A., Moreira, A., Silva, E., Almeida, L., Oliveira, L., Nascimento, T.

Robotics and Autonomous Systems, 63(1):68-79, 2015 (article)

Abstract
In this paper we introduce a formation control loop that maximizes the performance of the cooperative perception of a tracked target by a team of mobile robots, while maintaining the team in formation, with a dynamically adjustable geometry which is a function of the quality of the target perception by the team. In the formation control loop, the controller module is a distributed non-linear model predictive controller and the estimator module fuses local estimates of the target state, obtained by a particle filter at each robot. The two modules and their integration are described in detail, including a real-time database associated to a wireless communication protocol that facilitates the exchange of state data while reducing collisions among team members. Simulation and real robot results for indoor and outdoor teams of different robots are presented. The results highlight how our method successfully enables a team of homogeneous robots to minimize the total uncertainty of the tracked target cooperative estimate while complying with performance criteria such as keeping a pre-set distance between the teammates and the target, avoiding collisions with teammates and/or surrounding obstacles.

DOI [BibTex]

DOI [BibTex]

2011


Thumb xl trimproc small
High-quality reflection separation using polarized images

Kong, N., Tai, Y., Shin, S. Y.

IEEE Transactions on Image Processing, 20(12):3393-3405, IEEE Signal Processing Society, December 2011 (article)

Abstract
In this paper, we deal with a problem of separating the effect of reflection from images captured behind glass. The input consists of multiple polarized images captured from the same view point but with different polarizer angles. The output is the high quality separation of the reflection layer and the background layer from the images. We formulate this problem as a constrained optimization problem and propose a framework that allows us to fully exploit the mutually exclusive image information in our input data. We test our approach on various images and demonstrate that our approach can generate good reflection separation results.

Publisher site [BibTex]

2011

Publisher site [BibTex]


no image
A human inspired gaze estimation system

Wulff, J., Sinha, P.

Journal of Vision, 11(11):507-507, ARVO, September 2011 (article)

Abstract
Estimating another person's gaze is a crucial skill in human social interactions. The social component is most apparent in dyadic gaze situations, in which the looker seems to look into the eyes of the observer, thereby signaling interest or a turn to speak. In a triadic situation, on the other hand, the looker's gaze is averted from the observer and directed towards another, specific target. This is mostly interpreted as a cue for joint attention, creating awareness of a predator or another point of interest. In keeping with the task's social significance, humans are very proficient at gaze estimation. Our accuracy ranges from less than one degree for dyadic settings to approximately 2.5 degrees for triadic ones. Our goal in this work is to draw inspiration from human gaze estimation mechanisms in order to create an artificial system that can approach the former's accuracy levels. Since human performance is severely impaired by both image-based degradations (Ando, 2004) and a change of facial configurations (Jenkins & Langton, 2003), the underlying principles are believed to be based both on simple image cues such as contrast/brightness distribution and on more complex geometric processing to reconstruct the actual shape of the head. By incorporating both kinds of cues in our system's design, we are able to surpass the accuracy of existing eye-tracking systems, which rely exclusively on either image-based or geometry-based cues (Yamazoe et al., 2008). A side-benefit of this combined approach is that it allows for gaze estimation despite moderate view-point changes. This is important for settings where subjects, say young children or certain kinds of patients, might not be fully cooperative to allow a careful calibration. Our model and implementation of gaze estimation opens up new experimental questions about human mechanisms while also providing a useful tool for general calibration-free, non-intrusive remote eye-tracking.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Detecting synchrony in degraded audio-visual streams

Dhandhania, K., Wulff, J., Sinha, P.

Journal of Vision, 11(11):800-800, ARVO, September 2011 (article)

Abstract
Even 8–10 week old infants, when presented with two dynamic faces and a speech stream, look significantly longer at the ‘correct’ talking person (Patterson & Werker, 2003). This is true even though their reduced visual acuity prevents them from utilizing high spatial frequencies. Computational analyses in the field of audio/video synchrony and automatic speaker detection (e.g. Hershey & Movellan, 2000), in contrast, usually depend on high-resolution images. Therefore, the correlation mechanisms found in these computational studies are not directly applicable to the processes through which we learn to integrate the modalities of speech and vision. In this work, we investigated the correlation between speech signals and degraded video signals. We found a high correlation persisting even with high image degradation, resembling the low visual acuity of young infants. Additionally (in a fashion similar to Graf et al., 2002) we explored which parts of the face correlate with the audio in the degraded video sequences. Perfect synchrony and small offsets in the audio were used while finding the correlation, thereby detecting visual events preceding and following audio events. In order to achieve a sufficiently high temporal resolution, high-speed video sequences (500 frames per second) of talking people were used. This is a temporal resolution unachieved in previous studies and has allowed us to capture very subtle and short visual events. We believe that the results of this study might be interesting not only to vision researchers, but, by revealing subtle effects on a very fine timescale, also to people working in computer graphics and the generation and animation of artificial faces.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl trajectory pami
Trajectory Space: A Dual Representation for Nonrigid Structure from Motion

Akhter, I., Sheikh, Y., Khan, S., Kanade, T.

Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(7):1442-1456, IEEE, July 2011 (article)

Abstract
Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes. These basis are object dependent and therefore have to be estimated anew for each video sequence. In contrast, we propose a dual approach to describe the evolving 3D structure in trajectory space by a linear combination of basis trajectories. We describe the dual relationship between the two approaches, showing that they both have equal power for representing 3D structure. We further show that the temporal smoothness in 3D trajectories alone can be used for recovering nonrigid structure from a moving camera. The principal advantage of expressing deforming 3D structure in trajectory space is that we can define an object independent basis. This results in a significant reduction in unknowns, and corresponding stability in estimation. We propose the use of the Discrete Cosine Transform (DCT) as the object independent basis and empirically demonstrate that it approaches Principal Component Analysis (PCA) for natural motions. We report the performance of the proposed method, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piecewise rigid motion, partially nonrigid motion (such as a facial expressions), and highly nonrigid motion (such as a person walking or dancing).

pdf project page [BibTex]

pdf project page [BibTex]


Thumb xl sigalijcv11
Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation

Sigal, L., Isard, M., Haussecker, H., Black, M. J.

International Journal of Computer Vision, 98(1):15-48, Springer Netherlands, May 2011 (article)

Abstract
We formulate the problem of 3D human pose estimation and tracking as one of inference in a graphical model. Unlike traditional kinematic tree representations, our model of the body is a collection of loosely-connected body-parts. In particular, we model the body using an undirected graphical model in which nodes correspond to parts and edges to kinematic, penetration, and temporal constraints imposed by the joints and the world. These constraints are encoded using pair-wise statistical distributions, that are learned from motion-capture training data. Human pose and motion estimation is formulated as inference in this graphical model and is solved using Particle Message Passing (PaMPas). PaMPas is a form of non-parametric belief propagation that uses a variation of particle filtering that can be applied over a general graphical model with loops. The loose-limbed model and decentralized graph structure allow us to incorporate information from "bottom-up" visual cues, such as limb and head detectors, into the inference process. These detectors enable automatic initialization and aid recovery from transient tracking failures. We illustrate the method by automatically tracking people in multi-view imagery using a set of calibrated cameras and present quantitative evaluation using the HumanEva dataset.

pdf publisher's site link (url) Project Page Project Page [BibTex]

pdf publisher's site link (url) Project Page Project Page [BibTex]


Thumb xl pointclickimagewide
Point-and-Click Cursor Control With an Intracortical Neural Interface System by Humans With Tetraplegia

Kim, S., Simeral, J. D., Hochberg, L. R., Donoghue, J. P., Friehs, G. M., Black, M. J.

IEEE Transactions on Neural Systems and Rehabilitation Engineering, 19(2):193-203, April 2011 (article)

Abstract
We present a point-and-click intracortical neural interface system (NIS) that enables humans with tetraplegia to volitionally move a 2D computer cursor in any desired direction on a computer screen, hold it still and click on the area of interest. This direct brain-computer interface extracts both discrete (click) and continuous (cursor velocity) signals from a single small population of neurons in human motor cortex. A key component of this system is a multi-state probabilistic decoding algorithm that simultaneously decodes neural spiking activity and outputs either a click signal or the velocity of the cursor. The algorithm combines a linear classifier, which determines whether the user is intending to click or move the cursor, with a Kalman filter that translates the neural population activity into cursor velocity. We present a paradigm for training the multi-state decoding algorithm using neural activity observed during imagined actions. Two human participants with tetraplegia (paralysis of the four limbs) performed a closed-loop radial target acquisition task using the point-and-click NIS over multiple sessions. We quantified point-and-click performance using various human-computer interaction measurements for pointing devices. We found that participants were able to control the cursor motion accurately and click on specified targets with a small error rate (< 3% in one participant). This study suggests that signals from a small ensemble of motor cortical neurons (~40) can be used for natural point-and-click 2D cursor control of a personal computer.

pdf publishers's site pub med link (url) Project Page [BibTex]

pdf publishers's site pub med link (url) Project Page [BibTex]


Thumb xl middleburyimagesmall
A Database and Evaluation Methodology for Optical Flow

Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., Szeliski, R.

International Journal of Computer Vision, 92(1):1-31, March 2011 (article)

Abstract
The quantitative evaluation of optical flow algorithms by Barron et al. (1994) led to significant advances in performance. The challenges for optical flow algorithms today go beyond the datasets and evaluation methods proposed in that paper. Instead, they center on problems associated with complex natural scenes, including nonrigid motion, real sensor noise, and motion discontinuities. We propose a new set of benchmarks and evaluation methods for the next generation of optical flow algorithms. To that end, we contribute four types of data to test different aspects of optical flow algorithms: (1) sequences with nonrigid motion where the ground-truth flow is determined by tracking hidden fluorescent texture, (2) realistic synthetic sequences, (3) high frame-rate video used to study interpolation error, and (4) modified stereo sequences of static scenes. In addition to the average angular error used by Barron et al., we compute the absolute flow endpoint error, measures for frame interpolation error, improved statistics, and results at motion discontinuities and in textureless regions. In October 2007, we published the performance of several well-known methods on a preliminary version of our data to establish the current state of the art. We also made the data freely available on the web at http://vision.middlebury.edu/flow/ . Subsequently a number of researchers have uploaded their results to our website and published papers using the data. A significant improvement in performance has already been achieved. In this paper we analyze the results obtained to date and draw a large number of conclusions from them.

pdf pdf from publisher Middlebury Flow Evaluation Website [BibTex]

pdf pdf from publisher Middlebury Flow Evaluation Website [BibTex]


Thumb xl 1000dayimagesmall
Neural control of cursor trajectory and click by a human with tetraplegia 1000 days after implant of an intracortical microelectrode array

(J. Neural Engineering Highlights of 2011 Collection. JNE top 10 cited papers of 2010-2011.)

Simeral, J. D., Kim, S., Black, M. J., Donoghue, J. P., Hochberg, L. R.

J. of Neural Engineering, 8(2):025027, 2011 (article)

Abstract
The ongoing pilot clinical trial of the BrainGate neural interface system aims in part to assess the feasibility of using neural activity obtained from a small-scale, chronically implanted, intracortical microelectrode array to provide control signals for a neural prosthesis system. Critical questions include how long implanted microelectrodes will record useful neural signals, how reliably those signals can be acquired and decoded, and how effectively they can be used to control various assistive technologies such as computers and robotic assistive devices, or to enable functional electrical stimulation of paralyzed muscles. Here we examined these questions by assessing neural cursor control and BrainGate system characteristics on five consecutive days 1000 days after implant of a 4 × 4 mm array of 100 microelectrodes in the motor cortex of a human with longstanding tetraplegia subsequent to a brainstem stroke. On each of five prospectively-selected days we performed time-amplitude sorting of neuronal spiking activity, trained a population-based Kalman velocity decoding filter combined with a linear discriminant click state classifier, and then assessed closed-loop point-and-click cursor control. The participant performed both an eight-target center-out task and a random target Fitts metric task which was adapted from a human-computer interaction ISO standard used to quantify performance of computer input devices. The neural interface system was further characterized by daily measurement of electrode impedances, unit waveforms and local field potentials. Across the five days, spiking signals were obtained from 41 of 96 electrodes and were successfully decoded to provide neural cursor point-and-click control with a mean task performance of 91.3% ± 0.1% (mean ± s.d.) correct target acquisition. Results across five consecutive days demonstrate that a neural interface system based on an intracortical microelectrode array can provide repeatable, accurate point-and-click control of a computer interface to an individual with tetraplegia 1000 days after implantation of this sensor.

pdf pdf from publisher link (url) Project Page [BibTex]


Thumb xl ijnmbe1
Modelling pipeline for subject-specific arterial blood flow—A review

Igor Sazonov, Si Yong Yeo, Rhodri Bevan, Xianghua Xie, Raoul van Loon, Perumal Nithiarasu

International Journal for Numerical Methods in Biomedical Engineering, 27(12):1868–1910, 2011 (article)

Abstract
In this paper, a robust and semi-automatic modelling pipeline for blood flow through subject-specific arterial geometries is presented. The framework developed consists of image segmentation, domain discretization (meshing) and fluid dynamics. All the three subtopics of the pipeline are explained using an example of flow through a severely stenosed human carotid artery. In the Introduction, the state-of-the-art of both image segmentation and meshing is presented in some detail, and wherever possible the advantages and disadvantages of the existing methods are analysed. Followed by this, the deformable model used for image segmentation is presented. This model is based upon a geometrical potential force (GPF), which is a function of the image. Both the GPF calculation and level set determination are explained. Following the image segmentation method, a semi-automatic meshing method used in the present study is explained in full detail. All the relevant techniques required to generate a valid domain discretization are presented. These techniques include generating a valid surface mesh, skeletonization, mesh cropping, boundary layer mesh construction and various mesh cosmetic methods that are essential for generating a high-quality domain discretization. After presenting the mesh generation procedure, how to generate flow boundary conditions for both the inlets and outlets of a geometry is explained in detail. This is followed by a brief note on the flow solver, before studying the blood flow through the carotid artery with a severe stenosis.

[BibTex]

[BibTex]


Thumb xl tnip1
Geometrically Induced Force Interaction for Three-Dimensional Deformable Models

Si Yong Yeo, Xianghua Xie, Igor Sazonov, Perumal Nithiarasu

IEEE Transactions on Image Processing, 20(5):1373 - 1387, 2011 (article)

Abstract
In this paper, we propose a novel 3-D deformable model that is based upon a geometrically induced external force field which can be conveniently generalized to arbitrary dimensions. This external force field is based upon hypothesized interactions between the relative geometries of the deformable model and the object boundary characterized by image gradient. The evolution of the deformable model is solved using the level set method so that topological changes are handled automatically. The relative geometrical configurations between the deformable model and the object boundaries contribute to a dynamic vector force field that changes accordingly as the deformable model evolves. The geometrically induced dynamic interaction force has been shown to greatly improve the deformable model performance in acquiring complex geometries and highly concave boundaries, and it gives the deformable model a high invariancy in initialization configurations. The voxel interactions across the whole image domain provide a global view of the object boundary representation, giving the external force a long attraction range. The bidirectionality of the external force field allows the new deformable model to deal with arbitrary cross-boundary initializations, and facilitates the handling of weak edges and broken boundaries. In addition, we show that by enhancing the geometrical interaction field with a nonlocal edge-preserving algorithm, the new deformable model can effectively overcome image noise. We provide a comparative study on the segmentation of various geometries with different topologies from both synthetic and real images, and show that the proposed method achieves significant improvements against existing image gradient techniques.

[BibTex]

[BibTex]


Thumb xl thesis
Spatial Models of Human Motion

Soren Hauberg

University of Copenhagen, 2011 (phdthesis)

PDF [BibTex]

PDF [BibTex]


Thumb xl ijnmbe1
Computational flow studies in a subject-specific human upper airway using a one-equation turbulence model. Influence of the nasal cavity

Prihambodo Saksono, Perumal Nithiarasu, Igor Sazonov, Si Yong Yeo

International Journal for Numerical Methods in Biomedical Engineering, 87(1-5):96–114, 2011 (article)

Abstract
This paper focuses on the impact of including nasal cavity on airflow through a human upper respiratory tract. A computational study is carried out on a realistic geometry, reconstructed from CT scans of a subject. The geometry includes nasal cavity, pharynx, larynx, trachea and two generations of airway bifurcations below trachea. The unstructured mesh generation procedure is discussed in some length due to the complex nature of the nasal cavity structure and poor scan resolution normally available from hospitals. The fluid dynamic studies have been carried out on the geometry with and without the inclusion of the nasal cavity. The characteristic-based split scheme along with the one-equation Spalart–Allmaras turbulence model is used in its explicit form to obtain flow solutions at steady state. Results reveal that the exclusion of nasal cavity significantly influences the resulting solution. In particular, the location of recirculating flow in the trachea is dramatically different when the truncated geometry is used. In addition, we also address the differences in the solution due to imposed, equally distributed and proportionally distributed flow rates at inlets (both nares). The results show that the differences in flow pattern between the two inlet conditions are not confined to the nasal cavity and nasopharyngeal region, but they propagate down to the trachea.

[BibTex]

[BibTex]


Thumb xl ijcv2012
Predicting Articulated Human Motion from Spatial Processes

Soren Hauberg, Kim S. Pedersen

International Journal of Computer Vision, 94, pages: 317-334, Springer Netherlands, 2011 (article)

Publishers site Code Paper site PDF [BibTex]

Publishers site Code Paper site PDF [BibTex]

1997


Thumb xl yasersmile
Recognizing facial expressions in image sequences using local parameterized models of image motion

Black, M. J., Yacoob, Y.

Int. Journal of Computer Vision, 25(1):23-48, 1997 (article)

Abstract
This paper explores the use of local parametrized models of image motion for recovering and recognizing the non-rigid and articulated motion of human faces. Parametric flow models (for example affine) are popular for estimating motion in rigid scenes. We observe that within local regions in space and time, such models not only accurately model non-rigid facial motions but also provide a concise description of the motion in terms of a small number of parameters. These parameters are intuitively related to the motion of facial features during facial expressions and we show how expressions such as anger, happiness, surprise, fear, disgust, and sadness can be recognized from the local parametric motions in the presence of significant head motion. The motion tracking and expression recognition approach performed with high accuracy in extensive laboratory experiments involving 40 subjects as well as in television and movie sequences.

pdf pdf from publisher abstract video [BibTex]