Political Sentiment Analysis - NAACL DravidaLangTech (Task 4)

This repository contains the code and models used for Task 4 (Political Sentiment Analysis) in the NAACL DravidaLangTech 2025 competition. The objective of this task is to classify political sentiment in Tamil code-mixed tweets.

📌 Project Overview

The dataset consists of code-mixed Dravidian text labeled for political sentiment (7 classes). Various machine learning and deep learning models have been explored to achieve the best classification performance. Datasets used are available in the dat folder. The fine tuning codes are available in the transformer models folder.

🚀 Best Performing Model

The best-performing model in this project is LaBSE + SVM, which achieved the highest F1 Score among all tested approaches.

📂 Repository Structure

📦 Project Root  
 ┣ 📂 Data/  
 ┃ ┣ 📜 cleaned_PS_train.csv        # Preprocessed training dataset
 ┃ ┣ 📜 cleaned_PS_dev.csv          # Preprocessed validation dataset
 ┃ ┣ 📜 cleaned_PS_test.csv         # Preprocessed test dataset
 ┃ ┣ 📜 PS_train.csv                # Original training dataset  
 ┃ ┣ 📜 PS_dev.csv                  # Original training dataset  
 ┃ ┣ 📜 PS_test_without_labels.csv  # Original test dataset with no labels
 ┃ ┗ 📜 submission.csv              # Final submission file
 ┃ 📂 Transformer Models/  
 ┃ ┣ 📜 bert_base_cased.ipynb       # Fine-tuned BERT-base-cased for classification  
 ┃ ┣ 📜 indic_bert.ipynb            # IndicBERT fine-tuning  
 ┃ ┣ 📜 indic_bert_nohashtag.ipynb  # IndicBERT without hashtags    
 ┃ ┣ 📜 muril_nohashtag.ipynb       # MuRIL without hashtags  
 ┃ ┗ 📜 tamil_sbert_nohashtag.ipynb # Tamil SBERT without hashtags  
 ┣ 📜 .gitignore  
 ┣ 📜 fasttext.ipynb                # FastText-based classification
 ┣ 📜 preprocess.ipynb              # Data preprocessing pipeline
 ┣ 📜 requirements.txt              # Python requirements
 ┣ 📜 svm.ipynb                     # SVM model using transformer embeddings  
 ┗ 📜 ensemble.ipynb                # Final trials using a combination of embeddings

🏆 Models Used

LaBSE + IndicBERT + TF-IDF + Attention (Best Performing method)
LaBSE + SVM (Best Model in single category)
BERT-base-cased
IndicBERT
MuRIL
FastText
SBERT-based models
SVM-based models
XLM-Roberta

All SVM-based models, are implemented in svm.ipynb where embeddings are first extracted and then a SVM classifier is trained. The other notebooks correspond to fine-tuned versions of their respective models for classification.

🛠️ Setup & Usage

1️⃣ Install Dependencies

pip install -r requirements.txt

2️⃣ Run Preprocessing

# Open preprocess.ipynb and execute the preprocessing pipeline

3️⃣ Train Models

Execute the respective notebooks (.ipynb) for training different models.

📝 Notes

LaBSE+SVM performed the best among all models.
Other SVM models are located in svm.ipynb.
Different models were fine-tuned for classification based on their respective architectures.
Due to dataset scarcity SVM based methods performed better

⚠️ AI-Generated Code Disclosure

Some boilerplate code, such as model training scripts and data preprocessing utilities, were generated using AI-assisted tools to streamline development. However, all model training, hyperparameter tuning, and evaluations were manually implemented and reviewed to ensure correctness.

📧 Contact

For any queries, feel free to reach out. [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Political Sentiment Analysis - NAACL DravidaLangTech (Task 4)

📌 Project Overview

🚀 Best Performing Model

📂 Repository Structure

🏆 Models Used

🛠️ Setup & Usage

1️⃣ Install Dependencies

2️⃣ Run Preprocessing

3️⃣ Train Models

📝 Notes

⚠️ AI-Generated Code Disclosure

📧 Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Data		Data
Transformer Models		Transformer Models
adasyn		adasyn
llmapi		llmapi
.gitignore		.gitignore
ensemble.ipynb		ensemble.ipynb
fasttext.ipynb		fasttext.ipynb
preprocess.ipynb		preprocess.ipynb
readme.md		readme.md
requirements.txt		requirements.txt
svm.ipynb		svm.ipynb

eshwanthkartitr/NAACL-2025-Political-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Political Sentiment Analysis - NAACL DravidaLangTech (Task 4)

📌 Project Overview

🚀 Best Performing Model

📂 Repository Structure

🏆 Models Used

🛠️ Setup & Usage

1️⃣ Install Dependencies

2️⃣ Run Preprocessing

3️⃣ Train Models

📝 Notes

⚠️ AI-Generated Code Disclosure

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages