This repo contains an implementation of an encoder-only transformer model for part-of-speech tagging. We have implemented this from scratch in both Matlab and Pytorch (Pytorch version to be added soon). The most important part of the code is the implementation of the transformer backpropagation from scratch.
To run POS taggin on the conll 2003 dataset, first download the data:
We use word2vec word embeddings which you can downlaod from here:
Number of Parameters : 202351
Training Accuracy : 93.59%
Testing Accuracy : 89.62%