Emotion Classification Using Naive Bayes

This repository contains a Python project that builds an emotion classification model from text data using a Multinomial Naive Bayes classifier. The workflow includes loading and preprocessing the data, feature extraction with TF-IDF, model training, evaluation, and making predictions on new text.

Overview

The project performs emotion classification by:

Reading a dataset containing text and corresponding emotion labels.
Replacing numerical labels with emotion names.
Preprocessing the text to remove stopwords and perform lemmatization.
Converting text into numerical features using TF-IDF.
Splitting the data into training and testing sets.
Training a Naive Bayes classifier.
Evaluating the model's performance and making a prediction on new input.

Data Preparation

Data Loading:
The dataset is loaded from a CSV file (emotions.csv) using pandas.
Label Mapping:
The code replaces numerical labels (0-5) with corresponding emotion names:
- 0 → "sadness"
- 1 → "joy"
- 2 → "love"
- 3 → "anger"
- 4 → "fear"
- 5 → "surprise"
Subsetting Data:
For efficiency, only the first 5000 rows are used for further analysis.

Exploratory Data Analysis

Dataset Shape & Missing Values:
The code prints the shape of the dataset and checks for any missing values.
Label Distribution:
It prints the frequency distribution of the emotion labels to understand class balance.
Data Information:
General information and summary statistics about the dataset are displayed.

Text Preprocessing

SpaCy Initialization:
The SpaCy English model (en_core_web_sm) is loaded to perform NLP tasks.
Preprocessing Function:
A function preprocess_text is defined to:
- Convert text to lowercase.
- Tokenize the text.
- Remove stopwords.
- Keep only alphabetic tokens.
- Apply lemmatization.
Applying Preprocessing:
The function is applied to the text column in the dataset, and a new column cleaned_text is created.

Feature Extraction

TF-IDF Vectorization:
The TfidfVectorizer is initialized with a maximum of 1000 features.
- The cleaned text data is transformed into a TF-IDF feature matrix.
Train-Test Split:
The feature matrix (X) and target labels (y) are split into training and testing sets using an 80/20 ratio.

Model Training and Evaluation

Training:
A Multinomial Naive Bayes classifier is instantiated and trained on the training data.
Prediction and Evaluation:
The classifier predicts emotions on the test set.
- The model's accuracy is calculated.
- A classification report is generated, showing precision, recall, and F1-score for each emotion.

Example Prediction

An example is provided where a new text ("He seemed mesmerized") is:

Preprocessed using the same preprocess_text function.
Transformed using the trained TF-IDF vectorizer.
Classified using the trained Naive Bayes classifier, outputting the predicted emotion.

Dependencies

Python Libraries:
- numpy
- pandas
- scikit-learn
- spacy
SpaCy Model:
- en_core_web_sm (Make sure to download it via python -m spacy download en_core_web_sm)

Usage

Clone the Repository
Clone this repository to your local machine.

Install Dependencies
Install the required libraries using pip:

pip install numpy pandas scikit-learn spacy
python -m spacy download en_core_web_sm

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
emotion_analysis.ipynb		emotion_analysis.ipynb
emotions.csv		emotions.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emotion Classification Using Naive Bayes

Table of Contents

Overview

Data Preparation

Exploratory Data Analysis

Text Preprocessing

Feature Extraction

Model Training and Evaluation

Example Prediction

Dependencies

Usage

About

Uh oh!

Releases

Packages

Languages

avianage/emotion-detection-classifier

Folders and files

Latest commit

History

Repository files navigation

Emotion Classification Using Naive Bayes

Table of Contents

Overview

Data Preparation

Exploratory Data Analysis

Text Preprocessing

Feature Extraction

Model Training and Evaluation

Example Prediction

Dependencies

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages