End-to-End-3D-Reconstruction-Paper-List

Personal list. With relevant research advancing fast and branching out widely, I'll only add papers meeting my needs hereafter.

Cross View
Pose Estimation
3D Reconstruction
Generation
Semantic
Depth
Dynamic
SLAM
Novel View Synthesis

Cross View

CroCo: Self-Supervised Pre-training for 3D Vision Tasks by Cross-View Completion [NeurIPS 2022] [croco]
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow [ICCV 2023] [croco]
3D-Consistent Image Inpainting with Diffusion Models [arXiv 2024] [croco-diff]
Alligat0R: Pre-Training through Co-Visibility Segmentation for Relative Camera Pose Regression [arXiv 2025] []

Pose Estimation

Cameras as Rays: Pose Estimation via Ray Diffusion [ICLR 2024] [RayDiffusion]
Cameras as Relative Positional Encoding [arXiv 2025] [prope]
Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization [CVPR 2025] [reloc3r]

3D Reconstruction

Visual Geometry Grounded Deep Structure From Motion [CVPR 2024] [vggsfm]
DUSt3R: Geometric 3D Vision Made Easy [CVPR 2024] [dust3r]
Grounding Image Matching in 3D with MASt3R [ECCV 2024] [mast3r]
MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion [arXiv 2024] [mast3r]
3D Reconstruction with Spatial Memory [3DV 2025] [spann3r]
MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds [CVPR 2025] [mv-dust3rp]
Continuous 3D Perception Model with Persistent State [CVPR 2025] [cut3r]
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass [CVPR 2025] [fast3r-3d]
Light3R-SfM: Towards Feed-forward Structure-from-Motion [arXiv 2025] []
MUSt3R: Multi-view Network for Stereo 3D Reconstruction [arXiv 2025] [must3r]
PE3R: Perception-Efficient 3D Reconstruction [arXiv 2025] [pe3r]
VGGT: Visual Geometry Grounded Transformer [CVPR 2025] [vggt]
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors [CVPR 2025] []
Matrix3D: Large Photogrammetry Model All-in-One [CVPR 2025] [ml-matrix3d]
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion [CVPR 2025] [DiffusionSfM]
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory [arXiv 2025] [Point3R]
π³: Scalable Permutation-Equivariant Visual Geometry Learning [arXiv 2025] [Pi3]
StreamVGGT: Streaming 4D Visual Geometry Transformer [arXiv 2025] [StreamVGGT]
Evict3R: Training-Free Token Eviction for Memory-Bounded Streaming Visual Geometry Transformers [arXiv 2025]
Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images [arXiv 2025] [Uni3R]
Surf3R: Rapid Surface Reconstruction from Sparse RGB Views in Seconds [arXiv 2025] []
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer [arXiv 2025] [STream3R]
WinT3R: Window-Based Streaming Reconstruction With Camera Token Pool [arXiv 2025] [WinT3R]
SAIL-Recon: Large SfM by Augmenting Scene Regression with Localization [arXiv 2025] [sail-recon]
FastVGGT: Training-Free Acceleration of Visual Geometry Transformer [arXiv 2025] [FastVGGT]
Faster VGGT with Block-Sparse Global Attention [arXiv 2025] [sparse-vggt]
Quantized Visual Geometry Grounded Transformer [arXiv 2025] [QuantVGGT]
MapAnything: Universal Feed-Forward Metric 3D Reconstruction [arXiv 2025] [map-anything]
TTT3R: 3D Reconstruction as Test-Time Training [arXiv 2025] [TTT3R]
WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting [arXiv 2025] [HunyuanWorld-Mirror]
OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer [arXiv 2025] [OmniVGGT]
Depth Anything 3: Recovering the Visual Space from Any Views [arXiv 2025] [DA3]

Generation

ReconViaGen: Towards Accurate Multi-view 3D Object Reconstruction via Generation [arXiv 2025] [ReconViaGen]

Semantic

PanSt3R: Multi-view Consistent Panoptic Segmentation [arXiv 2025] []
IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction [arXiv 2025] []

Depth

MoGe: Accurate Monocular Geometry Estimation [CVPR 2025] [MoGe]
DA2: Depth Anything in Any Direction [arXiv 2025] [DA2]
FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention [arXiv 2025] [FastVidar]

Dynamic

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [ICLR 2025] [monst3r]
MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos [CVPR 2025] [mega-sam]
Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving [arXiv 2024] [Driv3R]
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction [arXiv 2025] [Geo4D]
ViPE: Video Pose Engine for Geometric 3D Perception [arXiv 2025] [vipe]
VGGT4D: Mining Motion Cues in Visual Geometry Transformers for 4D Scene Reconstruction [arXiv 2025] [vggt4d]

SLAM

SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos [CVPR 2025] [SLAM3R]
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors [CVPR 2025] [mast3r-slam]
VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold [arXiv 2025] [VGGT-SLAM]
EC3R-SLAM: Efficient and Consistent Monocular Dense SLAM with Feed-Forward 3D Reconstruction [arXiv 2025] [EC3R-SLAM]

Novel View Synthesis

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs [arXiv 2024] [splatt3r]
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [ICLR 2025] [NoPoSplat]
PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence [arXiv 2024] [PreF3R]
SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction [CVPR 2025] [SPARS3R]
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias [ICLR 2025] [LVSM]
FlowR: Flowing from Sparse to Dense 3D Reconstructions [arXiv 2025] []
RayZer: A Self-supervised Large View Synthesis Model [arXiv 2025] [RayZer]
AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views [arXiv 2025] [AnySplat]
VGGT-X: When VGGT Meets Dense Novel View Synthesis [arXiv 2025] [VGGT-X]
YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting [arXiv 2025] [yonosplat]
E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training [arXiv 2025] [E-RayZer]
Off The Grid: Detection of Primitives for Feed-Forward 3D Gaussian Splatting [arXiv 2025] [OffTheGrid]
Sharp Monocular View Synthesis in Less Than a Second [arXiv 2025] [ml-sharp]
EcoSplat: Efficiency-controllable Feed-forward 3D Gaussian Splatting from Multi-view Images [arXiv 2025] [ecosplat-site]
From Rays to Projections: Better Inputs for Feed-Forward View Synthesis [arXiv 2026] [pvsm-web]

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End-3D-Reconstruction-Paper-List

Cross View

Pose Estimation

3D Reconstruction

Generation

Semantic

Depth

Dynamic

SLAM

Novel View Synthesis

About

Uh oh!

Releases

Packages

chicleee/End-to-End-3D-Reconstruction-Paper-List

Folders and files

Latest commit

History

Repository files navigation

End-to-End-3D-Reconstruction-Paper-List

Cross View

Pose Estimation

3D Reconstruction

Generation

Semantic

Depth

Dynamic

SLAM

Novel View Synthesis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages