Comparative Analysis of ML Models (KNN, SVM, Linear Regression) on Mushroom Dataset
Mushroom Dataset Exploration: Leveraging KNN, SVM, and Linear Models is a machine learning project aimed at exploring and analyzing the Mushroom Dataset from the UCI Machine Learning Repository. This dataset contains descriptive information about various mushroom species, including physical characteristics such as shape, color, surface texture, and odor. The goal is to classify mushrooms as either edible or poisonous.
This project implements multiple machine learning approaches for classification, focusing on the following models:
- K-Nearest Neighbors (KNN): Classifies samples based on their proximity to nearest neighbors.
- Support Vector Machine (SVM): Separates classes using a hyperplane with maximum margin.
- Linear Regression: Serves as a baseline model for performance comparison.
-
Data Preprocessing:
- Handling categorical variables.
- Addressing missing values.
- Normalizing data for better model performance.
-
Data Exploration:
- Visualizations to understand data distributions.
- Identifying patterns and correlations between features.
-
Model Benchmarking:
- Compare accuracy, precision, recall, and F1-score of each model.
-
Model Evaluation:
- Analyzing performance using confusion matrix and classification report.
- Python: Programming language used for data analysis and machine learning.
- Pandas: For data manipulation and preprocessing.
- Scikit-learn: For implementing machine learning models (KNN, SVM, Linear Regression).
- Matplotlib and Seaborn: For data visualization.
- NumPy: For numerical computations.
-
Clone this repository:
git clone <repository_url> cd <repository_folder>
-
Run the Jupyter Notebook: Open the Jupyter Notebook file in your preferred environment (e.g., Google Colab or Jupyter Notebook).
-
Explore the Data:
- Preprocess and clean the data.
- Visualize the features and explore patterns.
- Train and evaluate KNN, SVM, and Linear Regression models.
-
The performance comparison of the models will be displayed using metrics such as:
- Accuracy
- Precision
- Recall
- F1-score
-
Confusion Matrix and Classification Report will provide insights into model performance and help identify strengths and weaknesses.
This project was developed during my association with Cyber Academy at the Cyber-Physical Systems Lab, showcasing the integration of practical machine learning techniques with domain expertise. It demonstrates how mushroom characteristics can be effectively used for classification while comparing the strengths and weaknesses of commonly used machine learning algorithms.
By exploring the Mushroom Dataset, this project provides valuable insights into machine learning techniques and how they can be applied to real-world classification problems.