Skip to content

ntnu-arl/diffusionMMS

Repository files navigation

DiffusionMMS

This repository contains the source code for the paper Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer (Accepted to The 22nd International Conference on Advanced Robotics (ICAR 2025))

Usage

Dependencies

The results in the paper is tested on Ubuntu 22.04, pytorch 2.0.1 and CUDA 11.7. To install pytorch and other necessary python library, one can use the following command

conda create -n cu117 python=3.10
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

Install NATTEN

To install NATTEN, please follow the instruction in the NATTEN repository

Data preparation

Download NYUv2 and SUNRGBD dataset from Google Drive and put them under data folder. Each dataset is categorized into train and val split using a text file containing the filename of images file.

To create a list of depth images that have the most proportion of invalid pixels in the datasets, use the following command

python -m utils.ranking_data --img_dir <path-to-depth-images-directory> --file <path-to-test-index-file>

Training

Download different variants of UperNet DAT++ backbone from this repository and put them under pretrained folder

You now can choose different config file to train different type of models

python train.py --config <path-to-config-file>

Example

python train.py --config config/nyuv2/standard/ddp_dual_dat_s_mmcv_epoch_100.yaml

Evaluation

Evaluation results of multiple checkpoints can be produced using the following command

python eval.py --config <path-to-config-file> --fr <start-epoch> --to <end-epoch>

Visualize

You can visualize the results on each datasets using the following command

python test.py --config <path-to-config-file> --epoch <epoch> --show

Citing

If you reference our work in your research, please cite the following paper:

@misc{bui2024diffusionbasedrgbdsemanticsegmentation,
      title={Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer}, 
      author={Minh Bui and Kostas Alexis},
      year={2024},
      eprint={2409.15117},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.15117}, 
}

Contact

For inquiries, feel free to reach out to the authors:

This research was conducted at the Autonomous Robots Lab, Norwegian University of Science and Technology (NTNU).

For more information, visit our website.

Acknowledgements

Our implementation is partly based on mmsegmentation, CMX and DDP. Thanks for their authors.

This material was supported by the Research Council of Norway under Award NO-338694 and the European Commission under Grant No. 101121321.

About

[ICAR2025] Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages