Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images

Philipp Wulff, Felix Wimbauer, Dominik Muhle, Daniel Cremers

teaser_crop_3fps_3x.mp4

We leverage a diffusion model and a depth predictor to generate high-quality scene geometry from a single image. Then, we distill a feed-forward scene reconstruction model, which performs on par with methods trained via multi-view supervision.

This is the official implementation of the paper:

Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images

Philipp Wulff¹, Felix Wimbauer^1,2, Dominik Muhle^1,2,3 and Daniel Cremers^1,2
¹Technical University of Munich, ²MCML, ³SE3 Labs

ICCV 2025

If you find our work useful, please consider citing our paper:

@inproceedings{wulff2025dreamtorecon,
  title     = {Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images},
  author    = {Wulff, Philipp and Wimbauer, Felix and Muhle, Dominik and Cremers, Daniel},
  booktitle = {IEEE International Conference on Computer Vision (ICCV)},
  year      = {2025}
}

🪧 Overview

Dream-to-Recon comprises three steps: a) We train a view completion model (VCM) that inpaints occluded areas and refines warped views. Training uses only a single view per scene and leverages forward-backward warping for data generation. b) The VCM is applied iteratively alongside a depth prediction network to synthesize virtual novel views, enabling progressive refinement of the 3D geometry. c) The synthesized scene geometries are then used to distill a feed-forward scene reconstruction model by supervising occupancy and virtual depth.

🏗️️ Setup

git clone https://github.com/philippwulff/dream-to-recon

# We use Conda to manage our Python 🐍 environment
conda env create -f environment.yml
conda activate dtr

💾 Datasets

All data should be placed under the data/ folder (or linked to there, e.g. with ln -s /path/to/KITTI-360/* data/KITTI-360) in order to match our config files for the different datasets. The folder structure should look like:

data
├── KITTI-360
│   ├── calibration
│   ├── data_2d_raw
│   ├── data_3d_raw
│   └── data_poses
└── waymo
    ├── testing
    │   ├── 39847154216997509_6440_000_6460_000
    │   │   ├── frames
    │   │   ├── lidar
    │   │   └── poses.npy
    │   └── ...
    ├── training
    └── validation

Run this to get the pre-processed evaluation ground truth data (LiDAR occupancy volumes):

bash fetch_assets.sh GT

All non-standard data (like precomputed poses and datasplits) comes with this repository and can be found in the datasets/ folder.

KITTI-360

To download KITTI-360, go to https://www.cvlibs.net/datasets/kitti-360/index.php and create an account. We require the perspective images, raw velodyne scans, calibrations, and vehicle poses.

Waymo

The Waymo dataset format differs between versions and we used a Waymo dataset that is shared among our chair, but I believe it is V2.0.1. To download this version, install gsutil and run gsutil -m cp gs://waymo_open_dataset_v_2_0_1 data/waymo.

Other Dataset Implementations

This repository contains dataloader implementations for other datasets, too. These are not officially supported and are not guaranteed to work out of the box. However, they might be helpful when extending this codebase.

📸 Checkpoints

We provide download links for pretrained models for KITTI-360 and Waymo. Models will be stored under out/<dataset>/pretrained/<checkpoint-name>.pth.

# Run this from the root directory.
bash fetch_assets.sh checkpoints

🏃 Running the Pre-trained Models

We provide a script to run our pretrained models with custom data. The script can be found under scripts/images/gen_img_3d_vol_custom.py and takes the following flags:

--img <path> / i <path>: Path to input image. The image will be resized to match the model's default resolution. media/example/ contains two example images.
--model <model> / -m <model>: Which pretrained model to use (KITTI-360 (default), Waymo).

Note that we use the default projection matrices for the respective datasets.

# Save outputs to disk
python scripts/images/gen_img_3d_vol_custom.py --img media/example/0000.png --model KITTI-360

📽️ Re-producing Our Image and Video Results

We provide scripts to generate images and videos from the outputs of our models. Generally, you can adapt the model and configuration for the output by changing some constant in the scripts. Generated files are stored under media/.

	Command
Generate Videos
3D occupancy while driving	`python scripts/videos/gen_vid_3d_vol.py`
Novel views, occlusion maps, etc. for different trajectories and scenes	`python scripts/videos/gen_vid_nvs.py`
BEV density maps while driving	`python scripts/videos/gen_vid_nvs.py`
Generate Images
3D occupancy	`python scripts/images/gen_img_3d_vol.py`
VCM inputs & outputs	`python scripts/images/gen_img_vcm.py`
Occlusion detection outputs	`python scripts/images/gen_img_occlusion_detection.py`

🏋 Training

View Completion Model (Controlnet)

We trained our view completion models on a single Nvidia A40 GPU with 48GB memory.

# On KITTI-360
accelerate launch train_controlnet.py -cn controlnet_full_512x768
# On Waymo
accelerate launch train_controlnet.py -cn controlnet_full_512x768_waymo
# To fine-tune the KITTI-360 checkpoint on Waymo
accelerate launch train_controlnet.py -cn controlnet_full_512x768_waymo CONTROLNET.TRAIN.RESUME_FROM_CHECKPOINT="path/to/kitti360_checkpoint.pt"

Feed-forward Scene Reconstruction Model

Our scene reconstruction models were trained on 4 Nvidia A40 GPU with 48GB memory each.

# On KITTI-360
python train.py -cn exp_recon_full NAME="my_recon_kitti_360"
# On Waymo
python train.py -cn waymo NAME="my_recon_waymo"

📊 Evaluation

We further provide configurations to reproduce the evaluation results from the paper. These produce NVS metrics to evaluate the view completion model and LiDAR occupancy reconstruction results for the scene reconstruction model.

# Make sure you have downloaded our checkpoints before running this.
# Evaluate all view completion models.
bash eval_view_completion.sh
# Evaluate all reconstruction models.
bash eval_reconstruction.sh

🗣️ Acknowledgements

Acknowledgements: This work was funded by the ERC Advanced Grant ”SIMULACRON” (agreement #884679), the GNI Project ”AI4Twinning”, and the DFG project CR 250/26-1 ”4D YouTube”.

This repository is based on BehindTheScenes, PixelNeRF and Monodepth2.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bts		bts
configs		configs
data		data
datasets		datasets
docs		docs
media/example		media/example
scripts		scripts
utils		utils
vcm		vcm
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
eval.py		eval.py
eval_reconstruction.sh		eval_reconstruction.sh
eval_view_completion.sh		eval_view_completion.sh
fetch_assets.sh		fetch_assets.sh
pyrightconfig.json		pyrightconfig.json
train.py		train.py
train_controlnet.py		train_controlnet.py
train_controlnet.sbatch		train_controlnet.sbatch
train_distr_4.sbatch		train_distr_4.sbatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images

Philipp Wulff, Felix Wimbauer, Dominik Muhle, Daniel Cremers

🪧 Overview

🏗️️ Setup

💾 Datasets

📸 Checkpoints

🏃 Running the Pre-trained Models

📽️ Re-producing Our Image and Video Results

🏋 Training

View Completion Model (Controlnet)

Feed-forward Scene Reconstruction Model

📊 Evaluation

🗣️ Acknowledgements

About

Uh oh!

Releases 2

Languages

License

philippwulff/dream-to-recon

Folders and files

Latest commit

History

Repository files navigation

Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images

Philipp Wulff, Felix Wimbauer, Dominik Muhle, Daniel Cremers

🪧 Overview

🏗️️ Setup

💾 Datasets

📸 Checkpoints

🏃 Running the Pre-trained Models

📽️ Re-producing Our Image and Video Results

🏋 Training

View Completion Model (Controlnet)

Feed-forward Scene Reconstruction Model

📊 Evaluation

🗣️ Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Languages