Skip to content

vita-epfl/VoxDet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

[NeurIPS 25 Spotlight] VoxDet: Rethinking 3D Semantic Scene Completion as Dense Object Detection

πŸ“Œ This is the official PyTorch implementation of the work:

VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection
Wuyang Li 1 , Zhu Yu 2 , Alexandre Alahi 1
1 Γ‰cole Polytechnique FΓ©dΓ©rale de Lausanne (EPFL); 2 Zhejiang University

VoxDet overview

πŸ“§ Contact: [email protected]

✨ Highlight

VoxDet addresses semantic occupancy prediction with an instance-centric formulation inspired by dense object detection, which uses a Voxel-to-Instance (VoxNT) trick for freely transferring voxel-level class labels to instance-level offset labels.

  • Versatile: Adaptable to various voxel-based scenarios, such as camera and LiDAR settings.
  • Powerful: Achieves joint state-of-the-art performance on both camera-based and LiDAR-based SSC benchmarks.
  • Efficient: Fast (~1.3Γ— speed-up) and lightweight (~57.9% parameter reduction).
  • Leaderboard Topper: Achieves 63.0 IoU (single-frame model), securing 1st place on the SemanticKITTI leaderboard.

Note that VoxDet is a single-frame single-model method without extra data and labels.

VoxDet overview

πŸ”§ Installation

Please refer to docs/install.md for detailed. This work is built on the CGFormer codebase. The installation, data preparation, training, and inference are consistent with CGFormer. If something is missing, you can check that codebase :)

πŸ“¦ Dataset Preparation

Please refer to docs/dataset.md for detailed dataset preparation instructions. Remember to change the data_root, ann_file and stereo_depth_root in every config file with your data path.

πŸƒ Train VoxDet

Download the depth pretraining model onedrive, and then change load_from in all confige files accordingly. This pre-training is consistent with CGFormer using the config configs/pretrain.py.

Please refer to the reproduced log in the logs folder after code cleaning to ensure that every step is correct.

Camera-based SemanticKITTI

2Γ— A100 40G

CUDA_VISIBLE_DEVICES=0,1 python main.py \
--config_path configs/voxdet-semantickitti-cam.py \
--log_folder voxdet-semantickitti-cam \
--seed 42 \
--log_every_n_steps 100

or with 4 GPUs (24GB memory)

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py \
--config_path configs/4gpu-semantickitti-cam.py \
--log_folder voxdet-semantickitti-cam \
--seed 42 \
--log_every_n_steps 100

LiDAR-based SemanticKITTI

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py \
--config_path configs/voxdet-semnatickitti-lidar.py \
--log_folder voxdet-semnatickitti-lidar \
--seed 42 \
--log_every_n_steps 100

Camera-based KITTI-360

2Γ— A100 80G

CUDA_VISIBLE_DEVICES=0,1 python main.py \
--config_path configs/voxdet-kitt360-cam.py \
--log_folder voxdet-kitt360-cam \
--seed 42 \
--log_every_n_steps 100

πŸ“Š Evaluate VoxDet

Download the pretrained models and place them in the ckpts/ folder, then run:

Camera-based SemanticKITTI

python main.py \
--eval --ckpt_path ./ckpts/voxdet-semantickitti-cam.ckpt \
--config_path configs/voxdet-semantickitti-cam.py \
--log_folder voxdet-semantickitti-cam-eval \
--seed 42 \
--log_every_n_steps 100

LiDAR-based SemanticKITTI

 python main.py \
--eval --ckpt_path ./ckpts/voxdet-semantickitti-lidar.ckpt \
--config_path configs/voxdet-semnatickitti-lidar.py \
--log_folder voxdet-semantickitti-lidar-eval \
--seed 42 \
--log_every_n_steps 100

Save Predictions

Add --save_path pred to save prediction results:

python main.py \
--eval --ckpt_path ./ckpts/voxdet-semantickitti-cam.ckpt \
--config_path configs/voxdet-semantickitti-cam.py \
--log_folder voxdet-semantickitti-cam-eval \
--seed 42 \
--log_every_n_steps 100 \
--save_path pred

πŸ“‹ Generate Predictions for SemanticKITTI Submission

For official SemanticKITTI leaderboard submission:

python main.py \
--eval --ckpt_path ./ckpts/voxdet-semantickitti-cam.ckpt \
--config_path configs/voxdet-semantickitti-cam-submit.py \
--log_folder voxdet-semantickitti-cam-submission \
--seed 42 \
--log_every_n_steps 100 \
--save_path submission \
--test_mapping

🎯 Model Zoo

Note that after using naive temporal fusion, VoxDet is able to achieve 20+ mIoU on SemanticKITTI test set (see logs folder).

We provide all reproduced information (models, configs, logs, everything) after the code cleaning onedrive. I did not test them on test set. So the performance might be slightly higer/lower than the paper, but should be very similar according to the tensorboard log.

We provide pretrained models for different configurations (Test set).

Method Dataset Modality IoU mIoU Config
VoxDet SemanticKITTI Camera 47.81 18.67 config
VoxDet SemanticKITTI LiDAR 63.0 26.0 config
VoxDet KITTI-360 Camera 48.59 21.40 config

🎨 Visualization

Please refer to docs/visualization.md.

πŸ“ Available Configurations

VoxDet provides multiple configuration files for different scenarios:

  • configs/voxdet-semantickitti-cam.py: Camera-based SemanticKITTI training
  • configs/voxdet-semnatickitti-lidar.py: LiDAR-based SemanticKITTI training
  • configs/voxdet-kitt360-cam.py: Camera-based KITTI-360 training
  • configs/4gpu-semantickitti-cam.py: 4-GPU optimized SemanticKITTI training
  • configs/baseline-dev-semantickitti-cam.py: Improved baseline with engineering tricks
  • configs/pretrain.py: first-stage depth pretraining. You need to use organize_ckpt.py to process checkpoint for model loading if you want to re-do this step by yourself. onedrive is out trained model, which is suggested to use directly.

πŸ“ˆ Training Logs

VoxDet (blue curve) is significantly more efficient and effective than the previous state-of-the-art method, CGFormer (gray color).

VoxDet logs

πŸ“‹ TODO List

  • Release the arXiv paper
  • Release the unified codebase, including both camera-based and LiDAR-based implementations
  • Release all models

πŸ“š Citation

VoxDet overview

If you find our work helpful for your research, please consider citing our paper:

@inproceedings{li2025voxdet,
  title={VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection},
  author={Li, Wuyang and Yu, Zhu and Alahi, Alexandre},
  journal={NeurIPS},
  year={2025}
}

πŸ™ Acknowledgement

Greatly appreciate the tremendous effort for the following projects!

About

[NeurIPS 25 Spotlight] VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published