Although swin transformer is a trending approach in computer vision, its focus on global attention can sometimes miss small local details, which has an important role in medical imaging. To overcome this problem, I proposed a residual-based hybrid swin transformer that leverages the power of the convolution network with the transformer's existing self-attention mechanism. When applied on diabetic retinopathy diagnosis task, the proposed model achieved 78.0% accuracy on the Aptos 2019 datasets with just 3662 training samples, resulting 8.3% improvement compared to the accuracy on the original swin transformer model.
Fig. Proposed SwinRes Model Architecture
The first phase of the base model consists of two parallel components: a tiny variant of swin transformer with 96 channels in the hidden layer of the first stage, and Resent-50 with initial 7x7 convolution layer followed by four blocks of 3x3 convolutions. The output from the transformer is passed through a linear layer, which produce a feature vector of size 150. Similarly, the Resnet output is passed through a linear layer, reducing it to the same size. These feature vectors are then concatenated to form a single output of size 300 which is forwarded to the respective batch normalization and dropout layer. In the end, there is a linear layer, which takes 300-dimensional input from the previous layer and transforms it into an output of size C, where C represents the number of target classes.
The proposed model was evaluated on the Aptos 2019 datasets, which consist of retinal images categorized into five classes: Normal, Mild, Moderate, Severe and Proliferative.
Fig. Different Phase of DR Eyes
Training Specification
Image size (Resized) = 256 x 256
Learning rate = 0.001
Batch size = 40
Epoch = 30
Optimization = Adam
When compared to the state-of-the-art Swin Transformer model, the proposed model demonstrated 8.3% higher accuracy than the existing model.
Table: Comparision of the different models on aptos 2019 datasets
Fig. Training-Validation Accuracy of SwinRes Model
- Clone this repository
git clone https://github.com/puskal-khadka/Enhanced-Diabetic-Retinopathy-Detection-with-Novel-SwinRes.git
- (Optional) Create a virtual environment
py -m venv venv
## Activate
venv/Scripts/activate
- Install the required libraries
pip install -r requirements.txt
- Training
py swinres.py