Skip to content

yashgarg7302/Email-Spam-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spam Email Classifier using Naïve Bayes in C++

1️⃣ Introduction (Problem Statement)

  • Spam emails are a significant problem in email communication.
  • Manually filtering them is inefficient, so automated classification using machine learning is necessary.
  • This project implements a Naïve Bayes Classifier in C++ to classify emails as spam or ham (legitimate emails).

2️⃣ Methodology (How It Works)

✅ Dataset Used

  • The project uses 4202 emails for training and 401 emails for testing.

✅ Preprocessing Steps

  • Load the dataset from training_set.data and test_set.data.
  • Tokenize email text and extract important words.
  • Compute word probabilities using Bayes' Theorem.
  • Classify emails based on probability scores.

✅ Algorithm Used

Implemented Naïve Bayes Classifier:

Image

  • P(Spam|Email): Probability that the email is spam
  • P(Email|Spam): Probability of the words occurring in a spam email
  • P(Spam): Prior probability of spam emails in the dataset
  • P(Email): Prior probability of any email

✅ Implementation in C++

  • The project uses C++ STL (Standard Template Library) for data structures.
  • Email.h and Email.cpp handle email processing.
  • spam filter.cpp is the main driver code.

3️⃣ Results (Model Performance)

The classifier was tested on 401 emails:

  • Ham (Legitimate emails) correctly predicted: 150
  • Spam correctly predicted: 164
  • Spam wrongly classified as Ham (False Negative): 37
  • Ham wrongly classified as Spam (False Positive): 50
  • Final Accuracy: 78.3%

Image

  • Precision: 76.5258 %
  • Recall: 81.5 %
  • F1 Score: 78.9346 %

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages