I am leading the Holistic Vision Group (HVG) in the Department of Perceiving Systems at the Max Planck Institute for Intelligent Systems, my group is funded by the DFG through the CRC 1233 on Robust Vision.
I am interested in the intersection between computer vision and machine learning with a focus on holistic visual scene understanding. In particular, I am interested in analyzing and modeling people in our complex visual scenes.
Offers:I am looking for highly motivated PhD student and PhD interns. I also have projects for bachelor and master thesis. If you are interested, please contact me direclty or send your application to firstname.lastname@example.org
New! We have one paper accepted to ACCV 2018 as oral presentation.
One paper accepted to ECCV 2018.
One paper accepted to BMVC 2018.
Our workon part-aligned bilinear representations for person re-identification is online.
Our work on human action segmentation in real time is online, and the code is available.
I will be an area chair for ACCV 2018.
I received anEarly career research grantto start my own research group at the Max Planck Instiute for Intelligent Systems and the University of Tübingen, details coming soon. I am looking for highly motivated PhD student and PhD interns!
I have successfully defended my PhD thesis "People Detection and Tracking in Crowded Scenes" on the 29th September 2017 at the Max Planck Institute for Informatics. Thesis Committee: Prof. Bernt Schiele, Prof. Michael Black, Prof. Luc Van Gool.
Winner of the CVPR 2017 Multi-Object Tracking Challenge (MOT17).
Four papers accepted at CVPR 2017!
DAGM MVTec Dissertation Award, 2018
Winner of the Multi-Object Tracking Challenge at CVPR 2017
Winner of the Multi-Object Tracking Challenge at ECCV 2016
BMVC Best Paper Award, 2012
Scholarship for excellence in academic performance RWTH Aachen 2009, 2010
SS 2016: High-Level Computer Vision, Saarland University, teaching assistant
SS 2015: High-Level Computer Vision, Saarland University, teaching assistant
SS 2013: High-Level Computer Vision, Saarland University, teaching assistant
In Proceedings of the British Machine Vision Conference (BMVC), pages: 269, BMVA Press, September 2018 (inproceedings)
Parsing continuous human motion into meaningful segments plays an essential role in various applications. In this work, we propose a hierarchical dynamic clustering framework to derive action clusters from a sequence of local features in an unsuper- vised bottom-up manner. We systematically investigate the modules in this framework and particularly propose diverse temporal pooling schemes, in order to realize accurate temporal action localization. We demonstrate our method on two motion parsing tasks: temporal action segmentation and abnormal behavior detection. The experimental results indicate that the proposed framework is significantly more effective than the other related state-of-the-art methods on several datasets.
In European Conference on Computer Vision (ECCV), 11218, pages: 418-437, Springer, Cham, September 2018 (inproceedings)
Comparing the appearance of corresponding body parts is essential for person re-identification. However, body parts are frequently misaligned be- tween detected boxes, due to the detection errors and the pose/viewpoint changes. In this paper, we propose a network that learns a part-aligned representation for person re-identification. Our model consists of a two-stream network, which gen- erates appearance and body part feature maps respectively, and a bilinear-pooling layer that fuses two feature maps to an image descriptor. We show that it results in a compact descriptor, where the inner product between two image descriptors is equivalent to an aggregation of the local appearance similarities of the cor- responding body parts, and thereby significantly reduces the part misalignment problem. Our approach is advantageous over other pose-guided representations by learning part descriptors optimal for person re-identification. Training the net- work does not require any part annotation on the person re-identification dataset. Instead, we simply initialize the part sub-stream using a pre-trained sub-network of an existing pose estimation network and train the whole network to minimize the re-identification loss. We validate the effectiveness of our approach by demon- strating its superiority over the state-of-the-art methods on the standard bench- mark datasets including Market-1501, CUHK03, CUHK01 and DukeMTMC, and standard video dataset MARS.
We present an effective dynamic clustering algorithm for the task of temporal human action segmentation, which has comprehensive applications such as robotics, motion analysis, and patient monitoring. Our proposed algorithm is unsupervised, fast, generic to process various types of features, and applica- ble in both the online and offline settings. We perform extensive experiments of processing data streams, and show that our algorithm achieves the state-of- the-art results for both online and offline settings.
Insafutdinov, E., Andriluka, M., Pishchulin, L., Tang, S., Levinkov, E., Andres, B., Schiele, B.
Articulated Multi-person Tracking in the Wild
In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 1293-1301, IEEE, July 2017, Oral (inproceedings)
Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., Schiele, B.
In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages: 4929-4937, IEEE, June 2016 (inproceedings)
This paper considers the task of articulated human pose estimation of multiple people in real-world images. We propose an approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies occluded body parts, and disambiguates body parts between people in close proximity of each other.
This joint formulation is in contrast to previous strategies, that address the problem by first detecting people and subsequently estimating their body pose. We propose a partitioning and labeling formulation of a set of body-part hypotheses generated with CNN-based part detectors. Our formulation, an instance of an integer linear program, implicitly performs non-maximum suppression on the set of part candidates and groups them to form configurations of body parts respecting geometric and appearance constraints. Experiments on four different datasets demonstrate state-of-the-art results for both single person and multi person pose estimation.
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems