teaser_crop_3fps_3x.mp4
This is the official implementation of the paper:
Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images
Philipp Wulff1, Felix Wimbauer1,2, Dominik Muhle1,2,3 and Daniel Cremers1,2
1Technical University of Munich, 2MCML, 3SE3 LabsICCV 2025
If you find our work useful, please consider citing our paper:
@inproceedings{wulff2025dreamtorecon,
title = {Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images},
author = {Wulff, Philipp and Wimbauer, Felix and Muhle, Dominik and Cremers, Daniel},
booktitle = {IEEE International Conference on Computer Vision (ICCV)},
year = {2025}
}
git clone https://github.com/philippwulff/dream-to-recon
# We use Conda to manage our Python 🐍 environment
conda env create -f environment.yml
conda activate dtrAll data should be placed under the data/ folder (or linked to there, e.g. with ln -s /path/to/KITTI-360/* data/KITTI-360) in order to match our config files for the
different datasets. The folder structure should look like:
data
├── KITTI-360
│ ├── calibration
│ ├── data_2d_raw
│ ├── data_3d_raw
│ └── data_poses
└── waymo
├── testing
│ ├── 39847154216997509_6440_000_6460_000
│ │ ├── frames
│ │ ├── lidar
│ │ └── poses.npy
│ └── ...
├── training
└── validationRun this to get the pre-processed evaluation ground truth data (LiDAR occupancy volumes):
bash fetch_assets.sh GTAll non-standard data (like precomputed poses and datasplits) comes with this repository and can be found in the datasets/ folder.
KITTI-360
To download KITTI-360, go to https://www.cvlibs.net/datasets/kitti-360/index.php and create an account. We require the perspective images, raw velodyne scans, calibrations, and vehicle poses.
Waymo
The Waymo dataset format differs between versions and we used a Waymo dataset that is shared among our chair, but I believe it is V2.0.1. To download this version, install gsutil and run gsutil -m cp gs://waymo_open_dataset_v_2_0_1 data/waymo.
Other Dataset Implementations
This repository contains dataloader implementations for other datasets, too. These are not officially supported and are not guaranteed to work out of the box. However, they might be helpful when extending this codebase.
We provide download links for pretrained models for KITTI-360 and Waymo.
Models will be stored under out/<dataset>/pretrained/<checkpoint-name>.pth.
# Run this from the root directory.
bash fetch_assets.sh checkpointsWe provide a script to run our pretrained models with custom data.
The script can be found under scripts/images/gen_img_3d_vol_custom.py and takes the following flags:
--img <path>/i <path>: Path to input image. The image will be resized to match the model's default resolution.media/example/contains two example images.--model <model>/-m <model>: Which pretrained model to use (KITTI-360(default),Waymo).
Note that we use the default projection matrices for the respective datasets.
# Save outputs to disk
python scripts/images/gen_img_3d_vol_custom.py --img media/example/0000.png --model KITTI-360We provide scripts to generate images and videos from the outputs of our models.
Generally, you can adapt the model and configuration for the output by changing some constant in the scripts.
Generated files are stored under media/.
| Command | |
|---|---|
| Generate Videos | |
| 3D occupancy while driving | python scripts/videos/gen_vid_3d_vol.py |
| Novel views, occlusion maps, etc. for different trajectories and scenes | python scripts/videos/gen_vid_nvs.py |
| BEV density maps while driving | python scripts/videos/gen_vid_nvs.py |
| Generate Images | |
| 3D occupancy | python scripts/images/gen_img_3d_vol.py |
| VCM inputs & outputs | python scripts/images/gen_img_vcm.py |
| Occlusion detection outputs | python scripts/images/gen_img_occlusion_detection.py |
We trained our view completion models on a single Nvidia A40 GPU with 48GB memory.
# On KITTI-360
accelerate launch train_controlnet.py -cn controlnet_full_512x768
# On Waymo
accelerate launch train_controlnet.py -cn controlnet_full_512x768_waymo
# To fine-tune the KITTI-360 checkpoint on Waymo
accelerate launch train_controlnet.py -cn controlnet_full_512x768_waymo CONTROLNET.TRAIN.RESUME_FROM_CHECKPOINT="path/to/kitti360_checkpoint.pt"
Our scene reconstruction models were trained on 4 Nvidia A40 GPU with 48GB memory each.
# On KITTI-360
python train.py -cn exp_recon_full NAME="my_recon_kitti_360"
# On Waymo
python train.py -cn waymo NAME="my_recon_waymo"We further provide configurations to reproduce the evaluation results from the paper. These produce NVS metrics to evaluate the view completion model and LiDAR occupancy reconstruction results for the scene reconstruction model.
# Make sure you have downloaded our checkpoints before running this.
# Evaluate all view completion models.
bash eval_view_completion.sh
# Evaluate all reconstruction models.
bash eval_reconstruction.shAcknowledgements: This work was funded by the ERC Advanced Grant ”SIMULACRON” (agreement #884679), the GNI Project ”AI4Twinning”, and the DFG project CR 250/26-1 ”4D YouTube”.
This repository is based on BehindTheScenes, PixelNeRF and Monodepth2.