The goal of this repository is to dive into natural language processing in the form of sentiment analysis. For this purpose, we choose the ‘Twitter Sentiment Analysis’ dataset developed by Sherif Hussein, which is available on Mendeley Data and labels about 160k tweets with a positive, negative or neutral sentiment. The following models were implemented, trained and compared:
- Neural Bag of Words
- Recurrent Neural Network
- Convolutional Neural Network
The analysis is performed with the popular and customisable machine learning library ‘PyTorch’ in a Jupyter notebook. The state-of-the-art transformer BERT, developed by Google AI, is used for tokenization.
(c) Mia Müßig