SONAR Rock vs Mine Prediction Machine Learning Project

Overview

This project implements various machine learning techniques to develop a model capable of predicting whether an object is a rock or a mine based on SONAR return data. The dataset used for this classification problem consists of SONAR return data, and multiple classification models are applied and compared to determine which one performs best.

Dataset

The SONAR Rock vs Mine Prediction dataset consists of features derived from SONAR return data. It includes 208 instances, each with 60 numerical features representing the energy within frequency bands. The target labels are 'M' (mines) and 'R' (rocks).

Dataset Overview

Feature	Description
60 numerical values	Features representing energy levels over frequency bands
'M' / 'R'	Class labels for mines and rocks

Model Description

This project employs several classification models, focusing on:

Logistic Regression: A simple yet effective model for binary classification.
Support Vector Classifier (SVC): Known for its effectiveness in high-dimensional spaces.
Decision Tree Classifier: A non-parametric supervised learning method.
Random Forest Classifier: An ensemble method that combines multiple decision trees.

Methodology

The overall methodology of the project can be summarized in the following steps:

Data Loading: Load the dataset into a pandas DataFrame.
Data Visualization and Summary: Explore the dataset using descriptive statistics and visualizations to understand patterns.
Data Preprocessing: Separate features and labels, and split the dataset into training and testing sets.
Model Training: Train various classification models using the training data.
Model Evaluation: Evaluate each model's performance using accuracy metrics and confusion matrices.

Model Training and Evaluation

Training

Each of the classifiers is trained on the training data. For example, here's how the Logistic Regression model is trained:

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, Y_train)

Evaluation

The accuracy results for the tested models are as follows:

Model	Accuracy
Logistic Regression	76.19%
Support Vector Classifier	80.95%
Decision Tree Classifier	71.43%
Random Forest Classifier	76.19%

Model Comparison

In this project, each model's performance was compared based on accuracy scores after training on the same dataset.

Model Comparison Code

from sklearn.metrics import accuracy_score

models = {
    "Logistic Regression": LogisticRegression(),
    "Support Vector Classifier": SVC(),
    "Decision Tree Classifier": DecisionTreeClassifier(),
    "Random Forest Classifier": RandomForestClassifier()
}

accuracy_scores = {}

for name, model in models.items():
    classifier = model.fit(X_train, Y_train)
    pred = classifier.predict(X_test)
    accuracy_scores[name] = accuracy_score(Y_test, pred)

classification_model_df = pd.DataFrame(accuracy_scores.items(), columns=["Model", "Accuracy"])
print(classification_model_df)

Performance Results

Model	Accuracy
Logistic Regression	76.19%
Support Vector Classifier	80.95%
Decision Tree Classifier	71.43%
Random Forest Classifier	76.19%

Confusion Matrix

The confusion matrices for the models are as follows:

Logistic Regression

Confusion Matrix:
[[9 2]
 [3 7]]

Support Vector Classifier

Confusion Matrix:
[[10  1]
 [ 3  7]]

Inference from Confusion Matrices

Logistic Regression

The confusion matrix for the Logistic Regression model is:

[[9 2]
 [3 7]]

True Positives (TP): 9 (Mines correctly predicted as mines)
True Negatives (TN): 7 (Rocks correctly predicted as rocks)
False Positives (FP): 2 (Rocks incorrectly predicted as mines)
False Negatives (FN): 3 (Mines incorrectly predicted as rocks)

Interpretations from the Confusion Matrix:

Accuracy: The model achieves an accuracy of approximately 76.19%, suggesting a reasonable level of classification performance.
Precision: The precision for the positive class (mines) is calculated as:

Precision = {TP} / {TP + FP} = {9} / {9+2} which is approx 0.818 (or 81.8%) This indicates that among all instances predicted as mines, 81.8% were actually mines.
Recall: The recall for the positive class (mines) is calculated as:

Recall = {TP} / {TP + FN} = {9} / {9+3} which is approx 0.75 (or 75%) This means that the model correctly identifies 75% of actual mines.
F1 Score: The F1 score can be computed as:

F1 Score = {2 * {Precision} * {Recall}} / {{Precision} + {Recall}} = {2 * 0.818 * 0.75} / {0.818 + 0.75} which is approx 0.782 This simplifies to F1 Score to approx 0.782 (or 78.2%).
Class Imbalance Consideration: The model misclassified 2 rocks as mines (false positives), suggesting a possible tendency to overpredict mines, which may need to be addressed.

Support Vector Classifier

The confusion matrix for the Support Vector Classifier is:

[[10  1]
 [ 3  7]]

True Positives (TP): 10 (Mines correctly predicted as mines)
True Negatives (TN): 7 (Rocks correctly predicted as rocks)
False Positives (FP): 1 (Rock incorrectly predicted as a mine)
False Negatives (FN): 3 (Mines incorrectly predicted as rocks)

Interpretations from the Confusion Matrix:

Accuracy: The SVC achieves an accuracy of approximately 80.95%, indicating better performance compared to the Logistic Regression model.
Precision: The precision for the positive class (mines) is calculated as:

Precision = {TP} / {TP + FP} = {10} / {10+1} which is approx 0.909 (or 90.9%) This shows that 90.9% of the instances predicted as mines were indeed mines.
Recall: The recall for the positive class (mines) is:

Recall = {TP} / {TP + FN} = {10} / {10+3} which is approx 0.769 (or 76.9%) This indicates that the model accurately identifies 76.9% of actual mines.
F1 Score: The F1 score can be computed as:

F1 Score = {2 * {Precision} * {Recall}} / {{Precision} + {Recall}} = {2 * 0.909 * 0.769} / {0.909 + 0.769} which is approx 0.833 This simplifies to F1 Score to approx 0.833 (or 83.3%).
Class Imbalance Consideration: The SVC has only one false positive, which suggests that it has a very good precision and is quite limited in misclassifying rocks as mines.

Code Organization

The code is organized into the following logical sections:

Data Importing and Preprocessing
Model Training and Evaluation
Model Comparison
Visualization and Prediction

Prerequisites

To run the code successfully, ensure you have the following libraries installed:

pandas
numpy
scikit-learn
matplotlib
seaborn

You can install them using pip:

pip install pandas numpy scikit-learn matplotlib seaborn

Running the Code

To execute the project:

Make sure you have all the required libraries installed.
Open the Jupyter Notebook containing the code snippets provided in this README.
Run the cells sequentially to train the models and evaluate their performances.

Streamlit Deployment

The project has been deployed as a web application using Streamlit. You can access the interactive application through the following link:

SONAR Rock vs Mine Prediction Streamlit App

This app allows users to input SONAR data and receive real-time predictions regarding whether the object is a rock or a mine.

How to Use the Streamlit App:

Input Form: Enter the values for SONAR returns in the provided input fields.

Prediction: Click on the "Predict" button to receive the prediction on whether the input corresponds to a rock or a mine.

Results Display: The application will display the results clearly on the screen.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Project Using Support Vector Classifer		Project Using Support Vector Classifer
README.md		README.md
SONAR Rock vs Mine Prediction Project.ipynb		SONAR Rock vs Mine Prediction Project.ipynb
SONAR Rock vs Mine Prediction Using Different Models Comaprison.ipynb		SONAR Rock vs Mine Prediction Using Different Models Comaprison.ipynb
Support Vector Classifier.ipynb		Support Vector Classifier.ipynb
app.py		app.py
logistic_model.pkl		logistic_model.pkl
requirements.txt		requirements.txt
sonar data.csv		sonar data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SONAR Rock vs Mine Prediction Machine Learning Project

Table of Contents

Overview

Dataset

Dataset Overview

Model Description

Methodology

Model Training and Evaluation

Training

Evaluation

Model Comparison

Model Comparison Code

Performance Results

Confusion Matrix

Logistic Regression

Support Vector Classifier

Inference from Confusion Matrices

Logistic Regression

Support Vector Classifier

Code Organization

Prerequisites

Running the Code

Streamlit Deployment

SONAR Rock vs Mine Prediction Streamlit App

How to Use the Streamlit App:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

jigyasaG18/SONAR-Rock-vs-Mine-Prediction-ML-Project

Folders and files

Latest commit

History

Repository files navigation

SONAR Rock vs Mine Prediction Machine Learning Project

Table of Contents

Overview

Dataset

Dataset Overview

Model Description

Methodology

Model Training and Evaluation

Training

Evaluation

Model Comparison

Model Comparison Code

Performance Results

Confusion Matrix

Logistic Regression

Support Vector Classifier

Inference from Confusion Matrices

Logistic Regression

Support Vector Classifier

Code Organization

Prerequisites

Running the Code

Streamlit Deployment

SONAR Rock vs Mine Prediction Streamlit App

How to Use the Streamlit App:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages