Spambase-dataset

Spambase dataset analysis and prediction

Our objective is to analyze, visualize and make a model able to predict if a mail if a spam or not spam. Our principle goal is to minimize false positives, since the priority is to have less mails as possible predicted as spam when they had real information.

During this project we tried mutliple models and tweeked parameters to find the best model.

We had the best results by applying Random Forest with 40 False positives out of 4600 mails (1812 of them being spams).

This project made us able to predict if a mail is a spam or not with high precision and helped me understund better and improve my machine learning models and my Python & Git abilities.

Thank you for your interest,

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Notes.txt		Notes.txt
README.md		README.md
Random Forest predictions.ipynb		Random Forest predictions.ipynb
Spambase analysis.pptx		Spambase analysis.pptx
Spambase.ipynb		Spambase.ipynb
devoir_python_for_data_analysis.pdf		devoir_python_for_data_analysis.pdf
rf_model.pkl		rf_model.pkl
spambase.DOCUMENTATION		spambase.DOCUMENTATION
spambase.data		spambase.data
spambase.names		spambase.names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spambase-dataset

About

Releases

Packages

Languages

mattcln/Spambase-dataset

Folders and files

Latest commit

History

Repository files navigation

Spambase-dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages