Official implementation of the paper 'MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection'.
MonoDETR is the first DETR-based model for monocular 3D detection without additional depth supervision, anchors or NMS, which achieves leading performance on KITTI val and test set. We enable the vanilla transformer in DETR to be depth-aware and enforce the whole detection process guided by depth. In this way, each object estimates its 3D attributes adaptively from the depth-informative regions on the image, not limited by center-around features.
The randomness of training for monocular detection would cause the variance of ±1 AP3D. For reproducibility, we provide four training logs of MonoDETR on KITTI val set for the car category: (the stable version is still under tuned)
Models | Val, AP3D|R40 | Logs | ||
Easy | Mod. | Hard | ||
MonoDETR | 28.84% | 20.61% | 16.38% | log |
26.66% | 20.14% | 16.88% | log | |
29.53% | 20.13% | 16.57% | log | |
27.11% | 20.08% | 16.18% | log |
MonoDETR on test set from official KITTI benckmark for the car category:
Models | Test, AP3D|R40 | ||
Easy | Mod. | Hard | |
MonoDETR | 24.52% | 16.26% | 13.93% |
25.00% | 16.47% | 13.58% |
-
Clone this project and create a conda environment:
git clone https://github.com/ZrrSkywalker/MonoDETR.git cd MonoDETR conda create -n monodetr python=3.8 conda activate monodetr
-
Install pytorch and torchvision matching your CUDA version:
conda install pytorch torchvision cudatoolkit
-
Install requirements and compile the deformable attention:
pip install -r requirements.txt cd lib/models/monodetr/ops/ bash make.sh cd ../../../..
-
Make dictionary for saving training losses:
mkdir logs
-
Download KITTI datasets and prepare the directory structure as:
│MonoDETR/ ├──... ├──data/KITTIDataset/ │ ├──ImageSets/ │ ├──training/ │ ├──testing/ ├──...
You can also change the data path at "dataset/root_dir" in
configs/monodetr.yaml
.
You can modify the settings of models and training in configs/monodetr.yaml
and appoint the GPU in train.sh
:
bash train.sh configs/monodetr.yaml > logs/monodetr.log
The best checkpoint will be evaluated as default. You can change it at "tester/checkpoint" in configs/monodetr.yaml
:
bash test.sh configs/monodetr.yaml
This repo benefits from the excellent Deformable-DETR and MonoDLE.
@article{zhang2022monodetr,
title={MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection},
author={Zhang, Renrui and Qiu, Han and Wang, Tai and Xu, Xuanzhuo and Guo, Ziyu and Qiao, Yu and Gao, Peng and Li, Hongsheng},
journal={arXiv preprint arXiv:2203.13310},
year={2022}
}
If you have any question about this project, please feel free to contact [email protected].