This is essentially a simplified version of Monocular Depth Estimation Based on Deep Learning: An Overview by Zhao et al. with some comments.
- Eigen. Depth map prediction from a single image using a multi-scale deep network
- Eigen. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture
- Mayer. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation
- Shelhamer. Scene intrinsics and depth from a single image
- Laina. Deeper depth prediction with fully convolutional residual networks
- residual learning
- reverse Huber loss (berhu) for better result than L2
- Mancini. Fast robust monocular depth estimation for obstacle detection with fully convolutional networks
- use image and optical flow to estimate depth
- Chen. Single image depth prediction in the wild (2016, DIW)
- new DIW dataset and relative depth annotations
- Fu. Deep ordinal regression network for monocular depth estimation (DORN)
- Li. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs
- conditional random fields
- super pixel?
- Liu. Learning depth from single monocular images using deep convolutional neural fields
- Wang. Towards unified depth and semantic prediction from a single image
- utilizing semantic consistency between depth and semantic labels
- Zhang. Joint task-recursive learning for semantic segmentation and depth estimation
- Xu. Structured attention guided convolutional neural fields for monocular depth estimation
- Feng. SGANVO: Unsupervised deep visual odometry and depth estimation with stacked generative adversarial networks (IEEE, 2019)
- Jung. Depth prediction from a single image with conditional adversarial networks (ICIP, 2017)
- Gwn Lore. Generative adversarial networks for depth map estimation from RGB video (CVPR, 2018)
Trained using monocular sequences, these methods project the prediction of one frame to the next. The camera intrinsics need to be known.
- Zhou. Unsupervised learning of depth and ego-motion from video
- Godard. Unsupervised monocular depth estimation with left-right consistency (CVPR, 2017)
The aforementioned models are based on the static-object assumption. To solve the problem that the assumption may not hold in real world, explainability masks are proposed to identify only the static objects.
TBA
TBA
TBA
Trained on stereo images, semi-supervised methods use inverse warping guided by the predicted disparity.
- Luo. Single view stereo matching (CVPR, 2018)
- Pilzer. Refine and distill: Exploiting cycle-inconsistency and knowledge distillation for unsupervised monocular depth estimation