PaddlePaddle reimplementation of microsoft's repository for the Swin-Transformer model that was released with the paper Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.
Swin Transformer (the name Swin stands for Shifted window) capably serves as a general-purpose backbone for computer vision. It is basically a hierarchical Transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection.
To enjoy some new features, a higher version of PaddlePaddle is required. For more installation tutorials refer to installation.md
export PADDLE_NNODES=1
export PADDLE_MASTER="127.0.0.1:12538"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m paddle.distributed.launch \
--nnodes=$PADDLE_NNODES \
--master=$PADDLE_MASTER \
--devices=$CUDA_VISIBLE_DEVICES \
plsc-train \
-c ./configs/swin_base_patch4_window7_224_in1k_1n8c_dp_fp16o2.yaml
# [Optional] Download checkpoint
mkdir -p pretrained/
wget -O ./pretrained/swin_base_patch4_window7_224_fp16o2.pdparams https://plsc.bj.bcebos.com/models/swin/v2.5/swin_base_patch4_window7_224_fp16o2.pdparams
export PADDLE_NNODES=1
export PADDLE_MASTER="127.0.0.1:12538"
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python -m paddle.distributed.launch \
--nnodes=$PADDLE_NNODES \
--master=$PADDLE_MASTER \
--devices=$CUDA_VISIBLE_DEVICES \
plsc-eval \
-c ./configs/swin_base_patch4_window7_224_in1k_1n8c_dp_fp16o2.yaml \
-o Global.pretrained_model=pretrained/swin_base_patch4_window7_224_fp16o2 \
-o Global.finetune=False
We provide more directly runnable configurations, see Swin Configurations.
Model | DType | Pretrain | Resolution | Configs | GPUs | Img/sec | Top1 Acc | Official | Checkpoint | Log |
---|---|---|---|---|---|---|---|---|---|---|
Swin-B | FP16 O1 | ImageNet2012 | 224x224 | config | A100*N1C8 | 2155 | 0.83362 | 0.835 | download | log |
Swin-B | FP16 O2 | ImageNet2012 | 224x224 | config | A100*N1C8 | 3006 | 0.83223 | 0.835 | download | log |
@inproceedings{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2021}
}