GitHub - k-karna/ViT_MLP-Mixer_Comparison: Vision Transformer (ViT) and MLP-Mixer Comparison | CIFAR10

Comparison of Vision Transformer (ViT) and MLP-Mixer

This is conducted on CIFAR10 dataset with $60,000$ images of $32 \times 32$ size in $10$ classes with $6000$ images per class.

We have divided training images set into $45,000$ for training, $5000$ for validation, before testing on $10,000$ images in test set.

Data Transformation implemented on Training + Validation set : Horizontal Flip, Resized Crop, and Normalization
Data Transformation implemented on Test : Normalization.

Vision Transformer (ViT)

Attention Block in ViT is modelled with embedding size of 256, hidden dimension of 512, and 8 heads in Multi-head Attention block before being passed onto Transformer.

MLP-Mixer

Two MLP Heads were added into one MLP-Mixer block. MLP head had hidden dimension of 512, and token and channel MLP dimension as 256, and 1024, respectively.

We have then conducted our experiment with three uniformity across both model of ViT and MLP-Mixer:

learning_rate = 2e-4 num_epochs = 30 and dropout=0.2

Our best results for both models are given below:

Model Name	Training Parameters	Training Time	Learning Rate	No. Epochs	Training Accuracy	Validation Accuracy	MODEL ACCURACY
ViT	3,195,146	6:00 Hr	2e-4	30	61.85	59.36	58.71
MLP-Mixer	1, 116, 490	1:48 Hr	2e-4	30	70.80	68.64	68.32

We can clearly observe, with less than one-third of parameters and training time, MLP-Mixer is clearly a winner over ViT for better training accuracy(& validation), and model accuracy.

Note:

We ran our models with $30$ epochs, so both model still had time to convergence to their best, as loss and accuracy were not flattened for both. However, results would have been fully congruent to our conclusion

Vit Model's Loss and Accuracy Plot
MLP-Mixer's Loss and Accuracy Plot

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Conf_mat_num_pe_30_lr2e-4.png		Conf_mat_num_pe_30_lr2e-4.png
Conf_mat_num_pe_30_lr3e-4.png		Conf_mat_num_pe_30_lr3e-4.png
MLP-Mixer_ConfMat_lr2e-4.png		MLP-Mixer_ConfMat_lr2e-4.png
MLP-Mixer_Loss_Acc_lr2e-4.png		MLP-Mixer_Loss_Acc_lr2e-4.png
MLP_Mixer.ipynb		MLP_Mixer.ipynb
Plot_num_ep_30_lr2e-4.png		Plot_num_ep_30_lr2e-4.png
Plot_num_ep_30_lr3e-4.png		Plot_num_ep_30_lr3e-4.png
README.md		README.md
ViT.ipynb		ViT.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparison of Vision Transformer (ViT) and MLP-Mixer

About

Releases

Packages

Languages

k-karna/ViT_MLP-Mixer_Comparison

Folders and files

Latest commit

History

Repository files navigation

Comparison of Vision Transformer (ViT) and MLP-Mixer

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages