Header logo is ps


2019


Towards Geometric Understanding of Motion
Towards Geometric Understanding of Motion

Ranjan, A.

University of Tübingen, December 2019 (phdthesis)

Abstract

The motion of the world is inherently dependent on the spatial structure of the world and its geometry. Therefore, classical optical flow methods try to model this geometry to solve for the motion. However, recent deep learning methods take a completely different approach. They try to predict optical flow by learning from labelled data. Although deep networks have shown state-of-the-art performance on classification problems in computer vision, they have not been as effective in solving optical flow. The key reason is that deep learning methods do not explicitly model the structure of the world in a neural network, and instead expect the network to learn about the structure from data. We hypothesize that it is difficult for a network to learn about motion without any constraint on the structure of the world. Therefore, we explore several approaches to explicitly model the geometry of the world and its spatial structure in deep neural networks.

The spatial structure in images can be captured by representing it at multiple scales. To represent multiple scales of images in deep neural nets, we introduce a Spatial Pyramid Network (SpyNet). Such a network can leverage global information for estimating large motions and local information for estimating small motions. We show that SpyNet significantly improves over previous optical flow networks while also being the smallest and fastest neural network for motion estimation. SPyNet achieves a 97% reduction in model parameters over previous methods and is more accurate.

The spatial structure of the world extends to people and their motion. Humans have a very well-defined structure, and this information is useful in estimating optical flow for humans. To leverage this information, we create a synthetic dataset for human optical flow using a statistical human body model and motion capture sequences. We use this dataset to train deep networks and see significant improvement in the ability of the networks to estimate human optical flow.

The structure and geometry of the world affects the motion. Therefore, learning about the structure of the scene together with the motion can benefit both problems. To facilitate this, we introduce Competitive Collaboration, where several neural networks are constrained by geometry and can jointly learn about structure and motion in the scene without any labels. To this end, we show that jointly learning single view depth prediction, camera motion, optical flow and motion segmentation using Competitive Collaboration achieves state-of-the-art results among unsupervised approaches.

Our findings provide support for our hypothesis that explicit constraints on structure and geometry of the world lead to better methods for motion estimation.

PhD Thesis [BibTex]

2019

PhD Thesis [BibTex]


AirCap -- Aerial Outdoor Motion Capture
AirCap – Aerial Outdoor Motion Capture

Ahmad, A., Price, E., Tallamraju, R., Saini, N., Lawless, G., Ludwig, R., Martinovic, I., Bülthoff, H. H., Black, M. J.

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), Workshop on Aerial Swarms, November 2019 (misc)

Abstract
This paper presents an overview of the Grassroots project Aerial Outdoor Motion Capture (AirCap) running at the Max Planck Institute for Intelligent Systems. AirCap's goal is to achieve markerless, unconstrained, human motion capture (mocap) in unknown and unstructured outdoor environments. To that end, we have developed an autonomous flying motion capture system using a team of aerial vehicles (MAVs) with only on-board, monocular RGB cameras. We have conducted several real robot experiments involving up to 3 aerial vehicles autonomously tracking and following a person in several challenging scenarios using our approach of active cooperative perception developed in AirCap. Using the images captured by these robots during the experiments, we have demonstrated a successful offline body pose and shape estimation with sufficiently high accuracy. Overall, we have demonstrated the first fully autonomous flying motion capture system involving multiple robots for outdoor scenarios.

Talk slides Project Page Project Page [BibTex]

Talk slides Project Page Project Page [BibTex]


Method for providing a three dimensional body model
Method for providing a three dimensional body model

Loper, M., Mahmood, N., Black, M.

September 2019, U.S.~Patent 10,417,818 (misc)

Abstract
A method for providing a three-dimensional body model which may be applied for an animation, based on a moving body, wherein the method comprises providing a parametric three-dimensional body model, which allows shape and pose variations; applying a standard set of body markers; optimizing the set of body markers by generating an additional set of body markers and applying the same for providing 3D coordinate marker signals for capturing shape and pose of the body and dynamics of soft tissue; and automatically providing an animation by processing the 3D coordinate marker signals in order to provide a personalized three-dimensional body model, based on estimated shape and an estimated pose of the body by means of predicted marker locations.

MoSh Project pdf [BibTex]


Perceiving Systems (2016-2018)
Perceiving Systems (2016-2018)
Scientific Advisory Board Report, 2019 (misc)

pdf [BibTex]

pdf [BibTex]

2010


ImageFlow: Streaming Image Search
ImageFlow: Streaming Image Search

Jampani, V., Ramos, G., Drucker, S.

MSR-TR-2010-148, Microsoft Research, Redmond, 2010 (techreport)

Abstract
Traditional grid and list representations of image search results are the dominant interaction paradigms that users face on a daily basis, yet it is unclear that such paradigms are well-suited for experiences where the user‟s task is to browse images for leisure, to discover new information or to seek particular images to represent ideas. We introduce ImageFlow, a novel image search user interface that ex-plores a different alternative to the traditional presentation of image search results. ImageFlow presents image results on a canvas where we map semantic features (e.g., rele-vance, related queries) to the canvas‟ spatial dimensions (e.g., x, y, z) in a way that allows for several levels of en-gagement – from passively viewing a stream of images, to seamlessly navigating through the semantic space and ac-tively collecting images for sharing and reuse. We have implemented our system as a fully functioning prototype, and we report on promising, preliminary usage results.

url pdf link (url) [BibTex]

2010

url pdf link (url) [BibTex]

2009


no image
ISocRob-MSL 2009 Team Description Paper for Middle Sized League

Lima, P., Santos, J., Estilita, J., Barbosa, M., Ahmad, A., Carreira, J.

13th Annual RoboCup International Symposium 2009, July 2009 (techreport)

Abstract
This paper describes the status of the ISocRob MSL roboticsoccer team as required by the RoboCup 2009 qualification procedures.Since its previous participation in RoboCup, the ISocRob team has car-ried out significant developments in various topics, the most relevantof which are presented here. These include self-localization, 3D objecttracking and cooperative object localization, motion control and rela-tional behaviors. A brief description of the hardware of the ISocRobrobots and of the software architecture adopted by the team is also in-cluded.

[BibTex]

2009

[BibTex]


no image
An introduction to Kernel Learning Algorithms

Gehler, P., Schölkopf, B.

In Kernel Methods for Remote Sensing Data Analysis, pages: 25-48, 2, (Editors: Gustavo Camps-Valls and Lorenzo Bruzzone), Wiley, New York, NY, USA, 2009 (inbook)

Abstract
Kernel learning algorithms are currently becoming a standard tool in the area of machine learning and pattern recognition. In this chapter we review the fundamental theory of kernel learning. As the basic building block we introduce the kernel function, which provides an elegant and general way to compare possibly very complex objects. We then review the concept of a reproducing kernel Hilbert space and state the representer theorem. Finally we give an overview of the most prominent algorithms, which are support vector classification and regression, Gaussian Processes and kernel principal analysis. With multiple kernel learning and structured output prediction we also introduce some more recent advancements in the field.

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Visual Object Discovery

Sinha, P., Balas, B., Ostrovsky, Y., Wulff, J.

In Object Categorization: Computer and Human Vision Perspectives, pages: 301-323, (Editors: S. J. Dickinson, A. Leonardis, B. Schiele, M.J. Tarr), Cambridge University Press, 2009 (inbook)

link (url) [BibTex]

link (url) [BibTex]


Automatic recognition of rodent behavior: A tool for systematic phenotypic analysis
Automatic recognition of rodent behavior: A tool for systematic phenotypic analysis

Serre, T.*, Jhuang, H*., Garrote, E., Poggio, T., Steele, A.

CBCL paper #283/MIT-CSAIL-TR #2009-052., MIT, 2009 (techreport)

pdf [BibTex]

pdf [BibTex]

2002


Bayesian Inference of Visual Motion Boundaries
Bayesian Inference of Visual Motion Boundaries

Fleet, D. J., Black, M. J., Nestares, O.

In Exploring Artificial Intelligence in the New Millennium, pages: 139-174, (Editors: Lakemeyer, G. and Nebel, B.), Morgan Kaufmann Pub., July 2002 (incollection)

Abstract
This chapter addresses an open problem in visual motion analysis, the estimation of image motion in the vicinity of occlusion boundaries. With a Bayesian formulation, local image motion is explained in terms of multiple, competing, nonlinear models, including models for smooth (translational) motion and for motion boundaries. The generative model for motion boundaries explicitly encodes the orientation of the boundary, the velocities on either side, the motion of the occluding edge over time, and the appearance/disappearance of pixels at the boundary. We formulate the posterior probability distribution over the models and model parameters, conditioned on the image sequence. Approximate inference is achieved with a combination of tools: A Bayesian filter provides for online computation; factored sampling allows us to represent multimodal non-Gaussian distributions and to propagate beliefs with nonlinear dynamics from one time to the next; and mixture models are used to simplify the computation of joint prediction distributions in the Bayesian filter. To efficiently represent such a high-dimensional space, we also initialize samples using the responses of a low-level motion-discontinuity detector. The basic formulation and computational model provide a general probabilistic framework for motion estimation with multiple, nonlinear models.

pdf [BibTex]

2002

pdf [BibTex]

1998


Looking at people in action - An overview
Looking at people in action - An overview

Yacoob, Y., Davis, L. S., Black, M., Gavrila, D., Horprasert, T., Morimoto, C.

In Computer Vision for Human–Machine Interaction, (Editors: R. Cipolla and A. Pentland), Cambridge University Press, 1998 (incollection)

publisher site google books [BibTex]

1998

publisher site google books [BibTex]