This repository contains the source code for the paper Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer (Accepted to The 22nd International Conference on Advanced Robotics (ICAR 2025))
The results in the paper is tested on Ubuntu 22.04, pytorch 2.0.1 and CUDA 11.7. To install pytorch and other necessary python library, one can use the following command
conda create -n cu117 python=3.10
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
Install NATTEN
To install NATTEN, please follow the instruction in the NATTEN repository
Download NYUv2 and SUNRGBD dataset from Google Drive and put them under data folder. Each dataset is categorized into train and val split using a text file containing the filename of images file.
To create a list of depth images that have the most proportion of invalid pixels in the datasets, use the following command
python -m utils.ranking_data --img_dir <path-to-depth-images-directory> --file <path-to-test-index-file>
Download different variants of UperNet DAT++ backbone from this repository and put them under pretrained folder
You now can choose different config file to train different type of models
python train.py --config <path-to-config-file>
Example
python train.py --config config/nyuv2/standard/ddp_dual_dat_s_mmcv_epoch_100.yaml
Evaluation results of multiple checkpoints can be produced using the following command
python eval.py --config <path-to-config-file> --fr <start-epoch> --to <end-epoch>
You can visualize the results on each datasets using the following command
python test.py --config <path-to-config-file> --epoch <epoch> --show
If you reference our work in your research, please cite the following paper:
@misc{bui2024diffusionbasedrgbdsemanticsegmentation,
title={Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer},
author={Minh Bui and Kostas Alexis},
year={2024},
eprint={2409.15117},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.15117},
}For inquiries, feel free to reach out to the authors:
-
Quang Minh Bui
-
Kostas Alexis
This research was conducted at the Autonomous Robots Lab, Norwegian University of Science and Technology (NTNU).
For more information, visit our website.
Our implementation is partly based on mmsegmentation, CMX and DDP. Thanks for their authors.
This material was supported by the Research Council of Norway under Award NO-338694 and the European Commission under Grant No. 101121321.