Personal list. With relevant research advancing fast and branching out widely, I'll only add papers meeting my needs hereafter.
- Cross View
- Pose Estimation
- 3D Reconstruction
- Generation
- Semantic
- Depth
- Dynamic
- SLAM
- Novel View Synthesis
-
CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion [NeurIPS 2022] [croco]
-
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow [ICCV 2023] [croco]
-
3D-Consistent Image Inpainting with Diffusion Models [arXiv 2024] [croco-diff]
-
Alligat0R: Pre-Training through Co-Visibility Segmentation for Relative Camera Pose Regression [arXiv 2025] []
-
Cameras as Rays: Pose Estimation via Ray Diffusion [ICLR 2024] [RayDiffusion]
-
Cameras as Relative Positional Encoding [arXiv 2025] [prope]
-
Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization [CVPR 2025] [reloc3r]
-
Visual Geometry Grounded Deep Structure From Motion [CVPR 2024] [vggsfm]
-
Grounding Image Matching in 3D with MASt3R [ECCV 2024] [mast3r]
-
MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion [arXiv 2024] [mast3r]
-
MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds [CVPR 2025] [mv-dust3rp]
-
Continuous 3D Perception Model with Persistent State [CVPR 2025] [cut3r]
-
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass [CVPR 2025] [fast3r-3d]
-
Light3R-SfM: Towards Feed-forward Structure-from-Motion [arXiv 2025] []
-
MUSt3R: Multi-view Network for Stereo 3D Reconstruction [arXiv 2025] [must3r]
-
PE3R: Perception-Efficient 3D Reconstruction [arXiv 2025] [pe3r]
-
VGGT: Visual Geometry Grounded Transformer [CVPR 2025] [vggt]
-
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors [CVPR 2025] []
-
Matrix3D: Large Photogrammetry Model All-in-One [CVPR 2025] [ml-matrix3d]
-
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion [CVPR 2025] [DiffusionSfM]
-
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory [arXiv 2025] [Point3R]
-
π³: Scalable Permutation-Equivariant Visual Geometry Learning [arXiv 2025] [Pi3]
-
StreamVGGT: Streaming 4D Visual Geometry Transformer [arXiv 2025] [StreamVGGT]
-
Evict3R: Training-Free Token Eviction for Memory-Bounded Streaming Visual Geometry Transformers [arXiv 2025]
-
Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images [arXiv 2025] [Uni3R]
-
Surf3R: Rapid Surface Reconstruction from Sparse RGB Views in Seconds [arXiv 2025] []
-
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer [arXiv 2025] [STream3R]
-
WinT3R: Window-Based Streaming Reconstruction With Camera Token Pool [arXiv 2025] [WinT3R]
-
SAIL-Recon: Large SfM by Augmenting Scene Regression with Localization [arXiv 2025] [sail-recon]
-
FastVGGT: Training-Free Acceleration of Visual Geometry Transformer [arXiv 2025] [FastVGGT]
-
Faster VGGT with Block-Sparse Global Attention [arXiv 2025] [sparse-vggt]
-
Quantized Visual Geometry Grounded Transformer [arXiv 2025] [QuantVGGT]
-
MapAnything: Universal Feed-Forward Metric 3D Reconstruction [arXiv 2025] [map-anything]
-
TTT3R: 3D Reconstruction as Test-Time Training [arXiv 2025] [TTT3R]
-
WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting [arXiv 2025] [HunyuanWorld-Mirror]
-
OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer [arXiv 2025] [OmniVGGT]
-
Depth Anything 3: Recovering the Visual Space from Any Views [arXiv 2025] [DA3]
- ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation [arXiv 2025] [ReconViaGen]
-
PanSt3R: Multi-view Consistent Panoptic Segmentation [arXiv 2025] []
-
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction [arXiv 2025] []
-
MoGe: Accurate Monocular Geometry Estimation [CVPR 2025] [MoGe]
-
DA2: Depth Anything in Any Direction [arXiv 2025] [DA2]
-
FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention [arXiv 2025] [FastVidar]
-
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [ICLR 2025] [monst3r]
-
MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos [CVPR 2025] [mega-sam]
-
Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving [arXiv 2024] [Driv3R]
-
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction [arXiv 2025] [Geo4D]
-
ViPE: Video Pose Engine for Geometric 3D Perception [arXiv 2025] [vipe]
-
VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction [arXiv 2025] [vggt4d]
-
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos [CVPR 2025] [SLAM3R]
-
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors [CVPR 2025] [mast3r-slam]
-
VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold [arXiv 2025] [VGGT-SLAM]
-
EC3R-SLAM: Efficient and Consistent Monocular Dense SLAM with Feed-Forward 3D Reconstruction [arXiv 2025] [EC3R-SLAM]
-
Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs [arXiv 2024] [splatt3r]
-
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [ICLR 2025] [NoPoSplat]
-
PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence [arXiv 2024] [PreF3R]
-
SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction [CVPR 2025] [SPARS3R]
-
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias [ICLR 2025] [LVSM]
-
FlowR: Flowing from Sparse to Dense 3D Reconstructions [arXiv 2025] []
-
RayZer: A Self-supervised Large View Synthesis Model [arXiv 2025] [RayZer]
-
AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views [arXiv 2025] [AnySplat]
-
VGGT-X: When VGGT Meets Dense Novel View Synthesis [arXiv 2025] [VGGT-X]
-
YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting [arXiv 2025] [yonosplat]
-
E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training [arXiv 2025] [E-RayZer]
-
Off The Grid: Detection of Primitives for Feed-Forward 3D Gaussian Splatting [arXiv 2025] [OffTheGrid]
-
Sharp Monocular View Synthesis in Less Than a Second [arXiv 2025] [ml-sharp]
-
EcoSplat: Efficiency-controllable Feed-forward 3D Gaussian Splatting from Multi-view Images [arXiv 2025] [ecosplat-site]
-
From Rays to Projections: Better Inputs for Feed-Forward View Synthesis [arXiv 2026] [pvsm-web]