Skip to content

avianage/emotion-detection-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Emotion Classification Using Naive Bayes

This repository contains a Python project that builds an emotion classification model from text data using a Multinomial Naive Bayes classifier. The workflow includes loading and preprocessing the data, feature extraction with TF-IDF, model training, evaluation, and making predictions on new text.

Table of Contents

Overview

The project performs emotion classification by:

  • Reading a dataset containing text and corresponding emotion labels.
  • Replacing numerical labels with emotion names.
  • Preprocessing the text to remove stopwords and perform lemmatization.
  • Converting text into numerical features using TF-IDF.
  • Splitting the data into training and testing sets.
  • Training a Naive Bayes classifier.
  • Evaluating the model's performance and making a prediction on new input.

Data Preparation

  1. Data Loading:
    The dataset is loaded from a CSV file (emotions.csv) using pandas.

  2. Label Mapping:
    The code replaces numerical labels (0-5) with corresponding emotion names:

    • 0 → "sadness"
    • 1 → "joy"
    • 2 → "love"
    • 3 → "anger"
    • 4 → "fear"
    • 5 → "surprise"
  3. Subsetting Data:
    For efficiency, only the first 5000 rows are used for further analysis.

Exploratory Data Analysis

  • Dataset Shape & Missing Values:
    The code prints the shape of the dataset and checks for any missing values.

  • Label Distribution:
    It prints the frequency distribution of the emotion labels to understand class balance.

  • Data Information:
    General information and summary statistics about the dataset are displayed.

Text Preprocessing

  • SpaCy Initialization:
    The SpaCy English model (en_core_web_sm) is loaded to perform NLP tasks.

  • Preprocessing Function:
    A function preprocess_text is defined to:

    • Convert text to lowercase.
    • Tokenize the text.
    • Remove stopwords.
    • Keep only alphabetic tokens.
    • Apply lemmatization.
  • Applying Preprocessing:
    The function is applied to the text column in the dataset, and a new column cleaned_text is created.

Feature Extraction

  • TF-IDF Vectorization:
    The TfidfVectorizer is initialized with a maximum of 1000 features.

    • The cleaned text data is transformed into a TF-IDF feature matrix.
  • Train-Test Split:
    The feature matrix (X) and target labels (y) are split into training and testing sets using an 80/20 ratio.

Model Training and Evaluation

  • Training:
    A Multinomial Naive Bayes classifier is instantiated and trained on the training data.

  • Prediction and Evaluation:
    The classifier predicts emotions on the test set.

    • The model's accuracy is calculated.
    • A classification report is generated, showing precision, recall, and F1-score for each emotion.

Example Prediction

An example is provided where a new text ("He seemed mesmerized") is:

  • Preprocessed using the same preprocess_text function.
  • Transformed using the trained TF-IDF vectorizer.
  • Classified using the trained Naive Bayes classifier, outputting the predicted emotion.

Dependencies

  • Python Libraries:

    • numpy
    • pandas
    • scikit-learn
    • spacy
  • SpaCy Model:

    • en_core_web_sm (Make sure to download it via python -m spacy download en_core_web_sm)

Usage

  1. Clone the Repository
    Clone this repository to your local machine.

  2. Install Dependencies
    Install the required libraries using pip:

    pip install numpy pandas scikit-learn spacy
    python -m spacy download en_core_web_sm

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published