DepthSplat

DepthSplat

Connecting Gaussian Splatting and Depth

Haofei Xu1,2     Songyou Peng1     Fangjinhua Wang1     Hermann Blum1     Daniel Barath1    
Andreas Geiger2     Marc Pollefeys1,3
1ETH Zurich     2University of Tübingen, Tübingen AI Center     3Microsoft

DepthSplat enables cross-task interactions between Gaussian splatting and depth estimation.

Left: Better depth leads to improved novel view synthesis with Gaussian splatting.

Right: Unsupervised depth pre-training with Gaussian splatting leads to reduced depth prediction error.

Feed-forward novel view synthesis (512x960) on unseen scenes from 6 input views.

Summary

  • We present DepthSplat to connect Gaussian splatting and single/multi-view depth estimation and study their interactions.
  • We contribute a robust multi-view depth model by leveraging pre-trained monocular depth features, leading to high-quality 3D Gaussian reconstructions.
  • We show that Gaussian splatting can serve as an unsupervised pre-training objective for learning powerful depth models from large-scale unlabelled datasets.
  • State-of-the-art performance on ScanNet, RealEstate10K and DL3DV datasets for both depth estimation and novel view synthesis tasks.

Architecture

Multi-view feature matching + monocular depth feature ⇒ high-quality depth and rendering
Unsupervised depth pre-training with Gaussian splatting ⇒ improved depth accuracy

Unsupervised Pre-Training with Gaussian Splatting

Unsupervised pre-training consistently helps for both Abs Rel (left) and Delta1 (right) metrics.
The improvements are especially significant on the challenging datasets like TartanAir and KITTI.

Scale-Consistent Depth Prediction

Our model predicts scale-consistent depths (aligned with the scale of the camera pose's translation).

Different Number of Input Views

Our DepthSplat consistently outperforms MVSplat with different number of input views.

Comparisons

Feed-forward view synthesis results from 6 input views on DL3DV dataset.

Large-Scale Scene Synthesis

Feed-forward large-scale scene synthesis from 12 input views (512x960) on DL3DV dataset.
(MVSplat runs OOM on this setting)

BibTeX

@article{xu2024depthsplat,
      title   = {DepthSplat: Connecting Gaussian Splatting and Depth},
      author  = {Xu, Haofei and Peng, Songyou and Wang, Fangjinhua and Blum, Hermann and Barath, Daniel and Geiger, Andreas and Pollefeys, Marc},
      journal = {arXiv preprint arXiv:2410.13862},
      year    = {2024}
    }

awesome webpage template