Spambase dataset analysis and prediction
Our objective is to analyze, visualize and make a model able to predict if a mail if a spam or not spam. Our principle goal is to minimize false positives, since the priority is to have less mails as possible predicted as spam when they had real information.
During this project we tried mutliple models and tweeked parameters to find the best model.
We had the best results by applying Random Forest with 40 False positives out of 4600 mails (1812 of them being spams).
This project made us able to predict if a mail is a spam or not with high precision and helped me understund better and improve my machine learning models and my Python & Git abilities.
Thank you for your interest,