Skip to content

Vision Transformer (ViT) and MLP-Mixer Comparison | CIFAR10 | PyTorch

Notifications You must be signed in to change notification settings

k-karna/ViT_MLP-Mixer_Comparison

Repository files navigation

Comparison of Vision Transformer (ViT) and MLP-Mixer

This is conducted on CIFAR10 dataset with $60,000$ images of $32 \times 32$ size in $10$ classes with $6000$ images per class.

We have divided training images set into $45,000$ for training, $5000$ for validation, before testing on $10,000$ images in test set.

  • Data Transformation implemented on Training + Validation set : Horizontal Flip, Resized Crop, and Normalization
  • Data Transformation implemented on Test : Normalization.

Vision Transformer (ViT)

Attention Block in ViT is modelled with embedding size of 256, hidden dimension of 512, and 8 heads in Multi-head Attention block before being passed onto Transformer.

MLP-Mixer

Two MLP Heads were added into one MLP-Mixer block. MLP head had hidden dimension of 512, and token and channel MLP dimension as 256, and 1024, respectively.


We have then conducted our experiment with three uniformity across both model of ViT and MLP-Mixer:

  • learning_rate = 2e-4 num_epochs = 30 and dropout=0.2

Our best results for both models are given below:

Model Name Training Parameters Training Time Learning Rate No. Epochs Training Accuracy Validation Accuracy MODEL ACCURACY
ViT 3,195,146 6:00 Hr 2e-4 30 61.85 59.36 58.71
MLP-Mixer 1, 116, 490 1:48 Hr 2e-4 30 70.80 68.64 68.32

We can clearly observe, with less than one-third of parameters and training time, MLP-Mixer is clearly a winner over ViT for better training accuracy(& validation), and model accuracy.

Note:

We ran our models with $30$ epochs, so both model still had time to convergence to their best, as loss and accuracy were not flattened for both. However, results would have been fully congruent to our conclusion

  • Vit Model's Loss and Accuracy Plot Plot

  • MLP-Mixer's Loss and Accuracy Plot

Plot

About

Vision Transformer (ViT) and MLP-Mixer Comparison | CIFAR10 | PyTorch

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published