University of Tübingen, December 2019 (phdthesis)
The motion of the world is inherently dependent on the spatial structure of the world and its geometry. Therefore, classical optical flow methods try to model this geometry to solve for the motion. However, recent deep learning methods take a completely different approach. They try to predict optical flow by learning from labelled data. Although deep networks have shown state-of-the-art performance on classification problems in computer vision, they have not been as effective in solving optical flow. The key reason is that deep learning methods do not explicitly model the structure of the world in a neural network, and instead expect the network to learn about the structure from data. We hypothesize that it is difficult for a network to learn about motion without any constraint on the structure of the world. Therefore, we explore several approaches to explicitly model the geometry of the world and its spatial structure in deep neural networks.
The spatial structure in images can be captured by representing it at multiple scales. To represent multiple scales of images in deep neural nets, we introduce a Spatial Pyramid Network (SpyNet). Such a network can leverage global information for estimating large motions and local information for estimating small motions. We show that SpyNet significantly improves over previous optical flow networks while also being the smallest and fastest neural network for motion estimation. SPyNet achieves a 97% reduction in model parameters over previous methods and is more accurate.
The spatial structure of the world extends to people and their motion. Humans have a very well-defined structure, and this information is useful in estimating optical flow for humans. To leverage this information, we create a synthetic dataset for human optical flow using a statistical human body model and motion capture sequences. We use this dataset to train deep networks and see significant improvement in the ability of the networks to estimate human optical flow.
The structure and geometry of the world affects the motion. Therefore, learning about the structure of the scene together with the motion can benefit both problems. To facilitate this, we introduce Competitive Collaboration, where several neural networks are constrained by geometry and can jointly learn about structure and motion in the scene without any labels. To this end, we show that jointly learning single view depth prediction, camera motion, optical flow and motion segmentation using Competitive Collaboration achieves state-of-the-art results among unsupervised approaches.
Our findings provide support for our hypothesis that explicit constraints on structure and geometry of the world lead to better methods for motion estimation.
NeuroImage, 202(15):116085, November 2019 (article) , Zhao, M., , , Mohler, B. J., Bartels, A., Bülthoff, I.
IEEE Robotics and Automation Letters, Robotics and Automation Letters, 4(4):4491-4498, IEEE, October 2019 (article) , , , Karlapalem, K., Bülthoff, H. H., ,
Hesse, N.,Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019 (article) , , Arens, M., Hofmann, U., Schroeder, S.
Kenny, S.,ACM Trans. Appl. Percept., 16(1):2:1-2:18, Febuary 2019 (article) , Honda, C., ,
IEEE Transactions on Visualization and Computer Graphics, 25, pages: 1887,1897, IEEE, 2019 (article) , , , , , Hesse, N., Bülthoff, H. H.,
van der Veer, A. H., Longo, M. R., Alsmith, A. J. T., Wong, H. Y.,Frontiers in Robotics and AI, 6(33), 2019 (article)
IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI), December 2015 (article) , Feragen, A., ,
ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1-248:16, ACM, New York, NY, October 2015 (article) , , , ,
ACM Transactions on Graphics, (Proc. SIGGRAPH), 34(4):120:1-120:14, ACM, August 2015 (article) , , ,
Vargas-Irwin, C. E., Franquemont, L.,Journal of Neuroscience, 35(30):10888-10897, July 2015 (article) , Donoghue, J. P.
Brown University, May 2015 (phdthesis)
Pepik, B., Stark, M.,Pattern Analysis and Machine Intelligence, 37(11):14, IEEE, March 2015 (article) , Schiele, B.
University of Padova, March 2015 (phdthesis)
Long Range Motion Estimation and Applications, University of Massachusetts Amherst, University of Massachusetts Amherst, Febuary 2015 (phdthesis)
Vargas-Irwin, C. E., Brandman, D. M., Zimmermann, J. B., Donoghue, J. P.,Neural Computation, 27(1):1-31, MIT Press, January 2015 (article)
International Journal of Computer Vision, pages: 1-13, 2015 (article) , Taylor, J., Shotton, J., Hertzmann, A., Fitzgibbon, A.
Lima, P.,Robotics and Autonomous Systems, 63(1):68-79, 2015 (article) , Dias, A., Conceição, A., Moreira, A., Silva, E., Almeida, L., Oliveira, L., Nascimento, T.
IEEE Transactions on Image Processing, 20(12):3393-3405, IEEE Signal Processing Society, December 2011 (article) , Tai, Y., Shin, S. Y.
Journal of Vision, 11(11):507-507, ARVO, September 2011 (article) , Sinha, P.
Dhandhania, K.,Journal of Vision, 11(11):800-800, ARVO, September 2011 (article) , Sinha, P.
Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(7):1442-1456, IEEE, July 2011 (article) , Sheikh, Y., Khan, S., Kanade, T.
Sigal, L., Isard, M., Haussecker, H.,International Journal of Computer Vision, 98(1):15-48, Springer Netherlands, May 2011 (article)
Kim, S., Simeral, J. D., Hochberg, L. R., Donoghue, J. P., Friehs, G. M.,IEEE Transactions on Neural Systems and Rehabilitation Engineering, 19(2):193-203, April 2011 (article)
Baker, S., Scharstein, D., Lewis, J. P., Roth, S.,International Journal of Computer Vision, 92(1):1-31, March 2011 (article) , Szeliski, R.
(J. Neural Engineering Highlights of 2011 Collection. JNE top 10 cited papers of 2010-2011.)
Simeral, J. D., Kim, S.,J. of Neural Engineering, 8(2):025027, 2011 (article) , Donoghue, J. P., Hochberg, L. R.
Andriluka, M., Sigal, L.,In Visual Analysis of Humans: Looking at People, pages: 253-274, (Editors: Moesland and Hilton and Kr"uger and Sigal), Springer-Verlag, London, 2011 (incollection)
Roth, S.,In Markov Random Fields for Vision and Image Processing, pages: 297-310, (Editors: Blake, A. and Kohli, P. and Rother, C.), MIT Press, 2011 (incollection)
Roth, S.,In Markov Random Fields for Vision and Image Processing, pages: 377-387, (Editors: Blake, A. and Kohli, P. and Rother, C.), MIT Press, 2011 (incollection)
Soren HaubergUniversity of Copenhagen, 2011 (phdthesis)
Igor Sazonov, Si Yong Yeo, Rhodri Bevan, Xianghua Xie, Raoul van Loon, Perumal NithiarasuInternational Journal for Numerical Methods in Biomedical Engineering, 27(12):1868–1910, 2011 (article)
Si Yong Yeo, Xianghua Xie, Igor Sazonov, Perumal NithiarasuIEEE Transactions on Image Processing, 20(5):1373 - 1387, 2011 (article)
In Visual Analysis of Humans: Looking at People, pages: 139-170, 9, , Rosenhahn, B. (Editors: T. Moeslund, A. Hilton, V. Krueger, L. Sigal), Springer, 2011 (inbook)
Soren Hauberg, Kim S. PedersenInternational Journal of Computer Vision, 94, pages: 317-334, Springer Netherlands, 2011 (article)
Prihambodo Saksono, Perumal Nithiarasu, Igor Sazonov, Si Yong YeoInternational Journal for Numerical Methods in Biomedical Engineering, 87(1-5):96–114, 2011 (article)
Roth, S.,International Journal of Computer Vision (IJCV), 82(2):205-29, April 2009 (article)
In Kernel Methods for Remote Sensing Data Analysis, pages: 25-48, 2, , (Editors: Gustavo Camps-Valls and Lorenzo Bruzzone), Wiley, New York, NY, USA, 2009 (inbook)
Sinha, P., Balas, B., Ostrovsky, Y.,In Object Categorization: Computer and Human Vision Perspectives, pages: 301-323, (Editors: S. J. Dickinson, A. Leonardis, B. Schiele, M.J. Tarr), Cambridge University Press, 2009 (inbook)
Liang Zhong, Yi Su, Si Yong Yeo, Ru San Tan Dhanjoo Ghista, Ghassan KassabAmerican Journal of Physiology – Heart and Circulatory Physiology, 296(3):H573-84, 2009 (article)
Si Yong Yeo, Liang Zhong, Yi Su, Ru San Tan, Dhanjoo GhistaMedical & Biological Engineering & Computing, 47(3):313-322, 2009 (article)
Ormoneit, D.,Image and Vision Computing, 23(14):1264-1276, December 2005 (article) , Hastie, T., Kjellström, H.
Jepson, A. D., Fleet, D. J.,US Pat. 6,954,544, October 2005 (patent)
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Video Proceedings,, pages: 1202, 2005 (patent)
Int. J. of Computer Vision, 38(3):231-245, July 2000 (article) , Fleet, D. J.
Fleet, D. J.,Int. J. of Computer Vision, 36(3):171-193, 2000 (article) , Yacoob, Y., Jepson, A. D.
Computer Vision and Image Understanding, 78(1):8-31, 2000 (article) , Fleet, D. J., Yacoob, Y.
Ju, S. X.,IEEE Trans. on Circuits and Systems for Video Technology, 8(5):686-696, September 1998 (article) , Minneman, S., Kimber, D.
IEEE Transactions on Image Processing, 7(3):421-432, March 1998 (article) , Sapiro, G., Marimont, D., Heeger, D.
Tsotsos, J. K., Verghese, G., Dickinson, S., Jenkin, M., Jepson, A., Milios, E., Nuflo, F., Stevenson, S.,Image & Vision Computing, Special Issue on Vision for the Disabled, 16(4):275-292, 1998 (article) , Metaxas, D., Culhane, S., Ye, Y., Mann, R.
International Journal of Computer Vision, 26(1):63-84, 1998 (article) , Jepson, A.