Classical optical flow methods are based on comparing image pixels using a photometric loss with no information about the world and its geometry, or the statistics of image information effects optical flow estimation. We developed several methods that explicitly exploit the structure of the scene at different levels, from semantic segmentation to reasoning about the geomety of the scene to unsupervised learning and reasoning about both the structure and the motion of objects.
In SOF [ ], we use a semantic segmentation of the scene to split the image into meaningful parts such as cars, people, and road segments. The semantic type of each part then imposes a strong prior on the motion of the respective part. Taking this into account helps resolve ambiguities of the motion, which in turn can be used to achieve high-quality motion segmentation.
In contrast, MR-Flow [ ] uses geometric constraints in the world to apply priors on optical flow estimation. Our method separates an image into a static background and moving objects. For the static background, the flow estimation is simplified by strong geometric constraints; furthermore, a lack of independent motion allows to take more than two frames into account. This geometrically constrained background optical flow is then combined with the flow of the moving objects, yielding a full, accurate flow field.
These methods are based on convex optimization and tend to be slow. Furthermore, we manually define constraints, which are often strong simplifications of the real world. To overcome this, we present the Collaborative Competition framework [ ]. CC reasons about the whole scene in a joint, data-driven fashion, and is able to learn to compute the segmentation and the geometry of the scene and the motion of objects and the background without explicit supervision.