DiffusionMMS

This repository contains the source code for the paper Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer (Accepted to The 22nd International Conference on Advanced Robotics (ICAR 2025))

Usage

Dependencies

The results in the paper is tested on Ubuntu 22.04, pytorch 2.0.1 and CUDA 11.7. To install pytorch and other necessary python library, one can use the following command

conda create -n cu117 python=3.10
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

Install NATTEN

To install NATTEN, please follow the instruction in the NATTEN repository

Data preparation

Download NYUv2 and SUNRGBD dataset from Google Drive and put them under data folder. Each dataset is categorized into train and val split using a text file containing the filename of images file.

To create a list of depth images that have the most proportion of invalid pixels in the datasets, use the following command

python -m utils.ranking_data --img_dir <path-to-depth-images-directory> --file <path-to-test-index-file>

Training

Download different variants of UperNet DAT++ backbone from this repository and put them under pretrained folder

You now can choose different config file to train different type of models

python train.py --config <path-to-config-file>

Example

python train.py --config config/nyuv2/standard/ddp_dual_dat_s_mmcv_epoch_100.yaml

Evaluation

Evaluation results of multiple checkpoints can be produced using the following command

python eval.py --config <path-to-config-file> --fr <start-epoch> --to <end-epoch>

Visualize

You can visualize the results on each datasets using the following command

python test.py --config <path-to-config-file> --epoch <epoch> --show

Citing

If you reference our work in your research, please cite the following paper:

@misc{bui2024diffusionbasedrgbdsemanticsegmentation,
      title={Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer}, 
      author={Minh Bui and Kostas Alexis},
      year={2024},
      eprint={2409.15117},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2409.15117}, 
}

Contact

For inquiries, feel free to reach out to the authors:

Quang Minh Bui

Email | GitHub | LinkedIn
Kostas Alexis

Email | GitHub | LinkedIn | X (formerly Twitter)

This research was conducted at the Autonomous Robots Lab, Norwegian University of Science and Technology (NTNU).

For more information, visit our website.

Acknowledgements

Our implementation is partly based on mmsegmentation, CMX and DDP. Thanks for their authors.

This material was supported by the Research Council of Norway under Award NO-338694 and the European Commission under Grant No. 101121321.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
datasets		datasets
engine		engine
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cross_dataset_eval.py		cross_dataset_eval.py
eval.py		eval.py
requirements.txt		requirements.txt
test.py		test.py
test_one_image.py		test_one_image.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DiffusionMMS

Usage

Dependencies

Data preparation

Training

Evaluation

Visualize

Citing

Contact

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ntnu-arl/diffusionMMS

Folders and files

Latest commit

History

Repository files navigation

DiffusionMMS

Usage

Dependencies

Data preparation

Training

Evaluation

Visualize

Citing

Contact

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages