Unifying Flow, Stereo and Depth Estimation

Haofei Xu1,2     Jing Zhang3     Jianfei Cai4     Hamid Rezatofighi4     Fisher Yu1    
Dacheng Tao5     Andreas Geiger2,6

1ETH Zurich     2University of Tübingen     3The University of Sydney     4Monash University     5JD Explore Academy     6MPI for Intelligent Systems, Tübingen

A unified model for three motion and 3D perception tasks.

Results on unseen videos.


  • A unified dense correspondence matching formulation and model for three tasks.
  • Our unified model naturally enables cross-task transfer (flow → stereo, flow → depth) since the model architecture and parameters are shared across tasks.
  • State-of-the-art or competitive performance on 10 popular flow, stereo and depth datasets, while being simpler and more effcient in terms of model design and inference speed.


Strong features + parameter-free matching layers ⇒ a unified model for flow/stereo/depth.
An additional self-attention layer to propagate the high-quality predictions to unmatched regions.

Cross-Task Transfer

Flow to depth transfer. We use an optical flow model pretrained on Chairs and Things datasets to directly predict depth on the ScanNet dataset, without any finetuning (no previous works can do such experiments). The performance can be further improved by finetuning for the depth task.

When finetuning with a pretrained flow model as initialization, we not only enjoy faster training speed for stereo and depth, but also achieve better performance.


Our GMFlow with only one refinement outperforms RAFT with 31 refinements on Sintel dataset.

We achieve the 1st places on Sintel (clean), Middlebury (rms metric) and Argoverse benchmarks.

Our GMFlow better captures fast-moving small object than RAFT.

Our GMStereo produces sharper object structures than RAFT-Stereo and CREStereo.


      title={Unifying Flow, Stereo and Depth Estimation},
      author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Yu, Fisher and Tao, Dacheng and Geiger, Andreas},
      journal={arXiv preprint arXiv:2211.05783},

This work is a substantial extension of our previous conference paper GMFlow (CVPR 2022, Oral), please consider citing GMFlow as well if you found this work useful in your research.

      title={GMFlow: Learning Optical Flow via Global Matching},
      author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Tao, Dacheng},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},