Monocular depth estimation models often fail when dealing with transparent, translucent, or mirrored objects — surfaces that violate normal visual depth cues.
This project proposes a Domain-Adaptive Depth Estimation framework that fine-tunes the Depth Anything v2 model to handle these complex visual conditions.
Our goal is to make monocular depth estimation more reliable for environments where robots, drones, or autonomous systems encounter materials like glass, metal, and reflective plastics.
Train a depth estimation model that:
- Adapts to transparent and reflective domains without requiring ground-truth depth labels.
- Maintains consistent depth prediction under style and lighting variations.
- Outperforms general-purpose models like Depth Anything v2 on this specialized domain.
We introduce a Style Consistency Loss that ensures the model predicts similar depth maps even after stylistic or illumination transformations of the same image.
This enforces geometry-focused learning, making the model robust to misleading visual textures such as reflections or glare.
- Use Depth Anything v2 (Small) as a pretrained base model.
- Perform unsupervised domain adaptation with unlabeled images of transparent/mirrored scenes.
- Train a lightweight adapter and decoder using:
- Style Consistency Loss
- Edge-Aware Smoothness Loss
- (Optional) Domain Adversarial Loss
- Evaluate on datasets like ClearGrasp.
- Domain adaptation without labeled depth data
- Style-invariant depth learning using Style Consistency Loss
- Specialized performance in reflective and transparent scenes
- Add uncertainty estimation to handle ambiguous surfaces.
- Integrate physics-aware priors (refraction modeling).
- Expand to multi-view or stereo adaptation.