An expressive model of human motion is the key component of methods for action classification, motion prediction and synthesis. To that end, we are exploring several deep network architectures to predict human movement.
Current methods for motion prediction typically do not work for a wide range of actions and suffer from ``regression to the mean''. We show that, surprisingly, state-of-the-art performance can be achieved by a simple baseline that does not model motion at all. We investigate this and propose three changes to the standard RNN models typically used for human motion, which result in a simple and scalable RNN architecture that obtains state-of-the-art performance on human motion prediction [ ].
We have also shown that a simple encoder/decoder architecture that takes a set of past poes and predicts a set of future poses works well and is simpler than RNN models. By forcing the encoding through a bottleneck, the approach learns features of human movement that are useful for action recognition. Our feed-forward networks outperform recurrent approaches for short- and long-term predictions and generalizes to novel subjects and actions [ ].
We have worked on several methods to estimate 3D pose from 2D joints. We show that this can actually be solved with a very simple network that outperforms previous, more complex, methods by a substantial margin. This suggests that ``lifting'' from 2D to 3D is not the really hard problem but, rather, that extracting the relevant information from the 2D image is the key [ ].
Neural networks may not generalize to scenarios that they have never seen -- imagine someone floating in zero gravity. In contrast, physics-based models can capture fundamental aspects of body moment and how it varies with different loads. We exploit our 3D body models which give us information about how mass is disturbed on the body. We develop a novel spacetime optimization approach that learns and robustly adapts physical controllers to new bodies and constraints [ ],