U-MixFormer: UNet-like Transformer with Mix-Attention for Efficient Semantic Segmentation [paper]

Introduction

U-MixFormer architecture.

We propose a novel transformer decoder, U-MixFormer, built upon the U-Net structure, designed for efficient semantic segmentation. Our approach distinguishes itself from the previous transformer methods by leveraging lateral connections between encoder and decoder stages as feature queries for the attention modules, apart from the traditional reliance on skip connections. Moreover, we innovatively mix hierarchical feature maps from various encoder and decoder stages to form a unified representation for keys and values, giving rise to our unique Mix-attention module.

Performance vs. computational efficiency on ADE20K (single-scale inference). U-MixFormer outperforms previous methods in all configurations.

Installation

We use MMSegmentation v1.0.0 as the codebase.

For install and data preparation, please find the guidelines in MMSegmentation v1.0.0 for the installation and data preparation.

Environments are conducted on CUDA 11.0 and pytorch 1.13.0

Training

# Single-gpu training
python tools/train.py configs/umixformer/umixformer_mit-b0_8xb2-160k_ade20k-512x512.py

# Multi-gpu training
./tools/dist_train.sh configs/umixformer/umixformer_mit-b0_8xb2-160k_ade20k-512x512.py <GPU_NUM>

Evaluation

All our models were trained using 2 A100 GPUs

Example: evaluate U-MixFormer-B0 on ADE20K:

# Single-gpu training
python tools/test.py configs/umixformer/umixformer_mit-b0_8xb2-160k_ade20k-512x512.py /path/to/checkpoint_file

# Multi-gpu training
./tools/dist_test.sh configs/umixformer/umixformer_mit-b0_8xb2-160k_ade20k-512x512.py /path/to/checkpoint_file <GPU_NUM>

Qualitative Test (i.e. visualization)

Visualization

python demo/image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out-file ${OUTPUT_IMAGE_NAME}] [--device ${DEVICE_NAME}] [--palette-thr ${PALETTE}]

Example: visualize U-MixFormer-B0 on cityscapes:

python demo/image_demo.py demo/demo.png configs/umixformer/umixformer_mit-b0_8xb1-160k_cityscapes-1024x1024.py \
/path/to/checkpoint_file --out-file demo/output.png --device cuda:0 --palette cityscapes

Onnx Model Conversion

Please first install mmdeploy in another folder and run on mmsegmentation folder

python /path/to/MMDEPLOY_PATH/tools/deploy.py ${DEPLOY_CONFIG_FILE} ${MODEL_CONFIG} ${CHECKPOINT_FILE} ${IMAGE_FILE} \
[--work-dir ${SAVE_FOLDER_NAME}] [--device ${DEVICE_NAME}] [--dump-info]

Example: Deploy U-MixFormer-B0 on ADE20K into ONNX model:

python /path/to/MMDEPLOY_PATH/tools/deploy.py ../mmdeploy/configs/mmseg/segmentation_onnxruntime_static-512x512.py \
configs/umixformer/umixformer_mit-b0_8xb2-160k_ade20k-512x512.py CHECKPOINT_FILE \
demo/demo.png \
--work-dir mmdeploy_model/umixformer_mit_b0_ade_512x512 \
--device cuda \
--dump-info

Table

Performance comparison with the state-of-the art light-weight and middle-weight methods on ADE20K and Cityscapes

Citation

If U-MixFormer is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@inproceedings{seulki2025umixformer,
  title={U-MixFormer: UNet-like Transformer with Mix-Attention for Efficient Semantic Segmentation},
  author={Seul-Ki Yeom and Julian von Klitzing},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={XXX--XXX},
  year={2025}
}

License

This project is released under the Apache 2.0 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

U-MixFormer: UNet-like Transformer with Mix-Attention for Efficient Semantic Segmentation [paper]

Introduction

Installation

Training

Evaluation

Qualitative Test (i.e. visualization)

Visualization

Onnx Model Conversion

Table

Citation

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

U-MixFormer: UNet-like Transformer with Mix-Attention for Efficient Semantic Segmentation [paper]

Introduction

Installation

Training

Evaluation

Qualitative Test (i.e. visualization)

Visualization

Onnx Model Conversion

Table

Citation

License