[NeurIPS 25 Spotlight] VoxDet: Rethinking 3D Semantic Scene Completion as Dense Object Detection

[Project Page] [ArXiv] [PPT]

📌 This is the official PyTorch implementation of the work:

VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection
Wuyang Li ¹, Zhu Yu ², Alexandre Alahi ¹
¹ École Polytechnique Fédérale de Lausanne (EPFL); ² Zhejiang University

📧 Contact: [email protected]

✨ Highlight

VoxDet addresses semantic occupancy prediction with an instance-centric formulation inspired by dense object detection, which uses a Voxel-to-Instance (VoxNT) trick for freely transferring voxel-level class labels to instance-level offset labels.

Versatile: Adaptable to various voxel-based scenarios, such as camera and LiDAR settings.
Powerful: Achieves joint state-of-the-art performance on both camera-based and LiDAR-based SSC benchmarks.
Efficient: Fast (~1.3× speed-up) and lightweight (~57.9% parameter reduction).
Leaderboard Topper: Achieves 63.0 IoU (single-frame model), securing 1st place on the SemanticKITTI leaderboard.

Note that VoxDet is a single-frame single-model method without extra data and labels.

🔧 Installation

Please refer to docs/install.md for detailed. This work is built on the CGFormer codebase. The installation, data preparation, training, and inference are consistent with CGFormer. If something is missing, you can check that codebase :)

📦 Dataset Preparation

Please refer to docs/dataset.md for detailed dataset preparation instructions. Remember to change the data_root, ann_file and stereo_depth_root in every config file with your data path.

🏃 Train VoxDet

Download the depth pretraining model onedrive, and then change load_from in all confige files accordingly. This pre-training is consistent with CGFormer using the config configs/pretrain.py.

Please refer to the reproduced log in the logs folder after code cleaning to ensure that every step is correct.

Camera-based SemanticKITTI

2× A100 40G

CUDA_VISIBLE_DEVICES=0,1 python main.py \
--config_path configs/voxdet-semantickitti-cam.py \
--log_folder voxdet-semantickitti-cam \
--seed 42 \
--log_every_n_steps 100

or with 4 GPUs (24GB memory)

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py \
--config_path configs/4gpu-semantickitti-cam.py \
--log_folder voxdet-semantickitti-cam \
--seed 42 \
--log_every_n_steps 100

LiDAR-based SemanticKITTI

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py \
--config_path configs/voxdet-semnatickitti-lidar.py \
--log_folder voxdet-semnatickitti-lidar \
--seed 42 \
--log_every_n_steps 100

Camera-based KITTI-360

2× A100 80G

CUDA_VISIBLE_DEVICES=0,1 python main.py \
--config_path configs/voxdet-kitt360-cam.py \
--log_folder voxdet-kitt360-cam \
--seed 42 \
--log_every_n_steps 100

📊 Evaluate VoxDet

Download the pretrained models and place them in the ckpts/ folder, then run:

Camera-based SemanticKITTI

python main.py \
--eval --ckpt_path ./ckpts/voxdet-semantickitti-cam.ckpt \
--config_path configs/voxdet-semantickitti-cam.py \
--log_folder voxdet-semantickitti-cam-eval \
--seed 42 \
--log_every_n_steps 100

LiDAR-based SemanticKITTI

 python main.py \
--eval --ckpt_path ./ckpts/voxdet-semantickitti-lidar.ckpt \
--config_path configs/voxdet-semnatickitti-lidar.py \
--log_folder voxdet-semantickitti-lidar-eval \
--seed 42 \
--log_every_n_steps 100

Save Predictions

Add --save_path pred to save prediction results:

python main.py \
--eval --ckpt_path ./ckpts/voxdet-semantickitti-cam.ckpt \
--config_path configs/voxdet-semantickitti-cam.py \
--log_folder voxdet-semantickitti-cam-eval \
--seed 42 \
--log_every_n_steps 100 \
--save_path pred

📋 Generate Predictions for SemanticKITTI Submission

For official SemanticKITTI leaderboard submission:

python main.py \
--eval --ckpt_path ./ckpts/voxdet-semantickitti-cam.ckpt \
--config_path configs/voxdet-semantickitti-cam-submit.py \
--log_folder voxdet-semantickitti-cam-submission \
--seed 42 \
--log_every_n_steps 100 \
--save_path submission \
--test_mapping

🎯 Model Zoo

Note that after using naive temporal fusion, VoxDet is able to achieve 20+ mIoU on SemanticKITTI test set (see logs folder).

We provide all reproduced information (models, configs, logs, everything) after the code cleaning onedrive. I did not test them on test set. So the performance might be slightly higer/lower than the paper, but should be very similar according to the tensorboard log.

We provide pretrained models for different configurations (Test set).

Method	Dataset	Modality	IoU	mIoU	Config
VoxDet	SemanticKITTI	Camera	47.81	18.67	config
VoxDet	SemanticKITTI	LiDAR	63.0	26.0	config
VoxDet	KITTI-360	Camera	48.59	21.40	config

🎨 Visualization

Please refer to docs/visualization.md.

📝 Available Configurations

VoxDet provides multiple configuration files for different scenarios:

configs/voxdet-semantickitti-cam.py: Camera-based SemanticKITTI training
configs/voxdet-semnatickitti-lidar.py: LiDAR-based SemanticKITTI training
configs/voxdet-kitt360-cam.py: Camera-based KITTI-360 training
configs/4gpu-semantickitti-cam.py: 4-GPU optimized SemanticKITTI training
configs/baseline-dev-semantickitti-cam.py: Improved baseline with engineering tricks
configs/pretrain.py: first-stage depth pretraining. You need to use organize_ckpt.py to process checkpoint for model loading if you want to re-do this step by yourself. onedrive is out trained model, which is suggested to use directly.

📈 Training Logs

VoxDet (blue curve) is significantly more efficient and effective than the previous state-of-the-art method, CGFormer (gray color).

📋 TODO List

Release the arXiv paper
Release the unified codebase, including both camera-based and LiDAR-based implementations
Release all models

📚 Citation

If you find our work helpful for your research, please consider citing our paper:

@inproceedings{li2025voxdet,
  title={VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection},
  author={Li, Wuyang and Yu, Zhu and Alahi, Alexandre},
  journal={NeurIPS},
  year={2025}
}

🙏 Acknowledgement

Greatly appreciate the tremendous effort for the following projects!

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
LightningTools		LightningTools
assets		assets
configs		configs
docs		docs
logs		logs
mmdet3d_plugin		mmdet3d_plugin
packages		packages
tools		tools
vggt		vggt
LICENSE		LICENSE
README.md		README.md
main.py		main.py
misc.py		misc.py
organize_ckpt.py		organize_ckpt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[NeurIPS 25 Spotlight] VoxDet: Rethinking 3D Semantic Scene Completion as Dense Object Detection

[Project Page] [ArXiv] [PPT]

✨ Highlight

🔧 Installation

📦 Dataset Preparation

🏃 Train VoxDet

Camera-based SemanticKITTI

LiDAR-based SemanticKITTI

Camera-based KITTI-360

📊 Evaluate VoxDet

Camera-based SemanticKITTI

LiDAR-based SemanticKITTI

Save Predictions

📋 Generate Predictions for SemanticKITTI Submission

🎯 Model Zoo

🎨 Visualization

📝 Available Configurations

📈 Training Logs

📋 TODO List

📚 Citation

🙏 Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

vita-epfl/VoxDet

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 25 Spotlight] VoxDet: Rethinking 3D Semantic Scene Completion as Dense Object Detection

[Project Page] [ArXiv] [PPT]

✨ Highlight

🔧 Installation

📦 Dataset Preparation

🏃 Train VoxDet

Camera-based SemanticKITTI

LiDAR-based SemanticKITTI

Camera-based KITTI-360

📊 Evaluate VoxDet

Camera-based SemanticKITTI

LiDAR-based SemanticKITTI

Save Predictions

📋 Generate Predictions for SemanticKITTI Submission

🎯 Model Zoo

🎨 Visualization

📝 Available Configurations

📈 Training Logs

📋 TODO List

📚 Citation

🙏 Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages