Unifying Flow, Stereo and Depth Estimation

Haofei Xu^1,2 Jing Zhang³ Jianfei Cai⁴ Hamid Rezatofighi⁴ Fisher Yu¹
Dacheng Tao³ Andreas Geiger^2,5

TPAMI 2023

¹ETH Zurich ²University of Tübingen ³The University of Sydney ⁴Monash University ⁵MPI for Intelligent Systems, Tübingen

Paper

Slides

Video Code

Colab

Demo

A unified model for three motion and 3D perception tasks.

Results on unseen videos.

Highlights

A unified dense correspondence matching formulation and model for three tasks.
Our unified model naturally enables cross-task transfer (flow → stereo, flow → depth) since the model architecture and parameters are shared across tasks.
State-of-the-art or competitive performance on 10 popular flow, stereo and depth datasets, while being simpler and more effcient in terms of model design and inference speed.

Overview

Strong features + parameter-free matching layers ⇒ a unified model for flow/stereo/depth.
An additional self-attention layer to propagate the high-quality predictions to unmatched regions.

Cross-Task Transfer

Flow to depth transfer. We use an optical flow model pretrained on Chairs and Things datasets to directly predict depth on the ScanNet dataset, without any finetuning. The performance can be further improved by finetuning for the depth task.

When finetuning with a pretrained flow model as initialization, we not only enjoy faster training speed for stereo and depth, but also achieve better performance.

Results

Our GMFlow with only one refinement outperforms RAFT with 31 refinements on Sintel dataset.

We achieve the 1st places on Sintel (clean), Middlebury (rms metric) and Argoverse benchmarks.

Our GMFlow better captures fast-moving small object than RAFT.

Our GMStereo produces sharper object structures than RAFT-Stereo and CREStereo.

BibTeX

@article{xu2023unifying,
      title={Unifying Flow, Stereo and Depth Estimation},
      author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Yu, Fisher and Tao, Dacheng and Geiger, Andreas},
      journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
      year={2023}
    }

This work is a substantial extension of our previous conference paper GMFlow (CVPR 2022, Oral), please consider citing GMFlow as well if you found this work useful in your research.

@inproceedings{xu2022gmflow,
      title={GMFlow: Learning Optical Flow via Global Matching},
      author={Xu, Haofei and Zhang, Jing and Cai, Jianfei and Rezatofighi, Hamid and Tao, Dacheng},
      booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages={8121-8130},
      year={2022}
    }

awesome webpage template