Skip to content

z5450851HimaMallina/Insurance_PredictionML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Insurance_PredictionML

Machine Learning - Medical Insurance Claims Prediction

Refer to attached ML_Models.pptx in this repository to access full business report image

Overview

This project focuses on leveraging machine learning techniques to enhance business decision-making. By analyzing real-world datasets, the goal is to develop predictive models that provide actionable insights for business operations. The primary objective is to improve model performance, business interpretability, and generalizability to future data.

Business Problem

Organizations increasingly rely on data-driven decision-making to optimize operations and improve customer experience. This project explores the following business case:

  • Medical Insurance Claims Prediction: Estimating claim costs based on patient demographics and medical history.

Data Sources

  1. Insurance Company Dataset: Medical cost prediction dataset sourced from Kaggle.

Approach

Implemented and compared multiple machine learning models to optimize prediction accuracy and business interpretability. The following steps were undertaken:

1. Data Preprocessing

  • Handled missing values and performed exploratory data analysis (EDA).
  • Feature engineering and selection to improve model relevance.
  • Scaled numerical features and encoded categorical variables.

2. Model Development

Experimented with different machine learning models, ensuring a balance between accuracy and business applicability.

Regression Models Used: Linear Regression Ridge Regression Lasso Regression Decision Tree Regressor Random Forest Regressor Gradient Boosting Regressor Stacking Regressor Neural Network (NLP Regressor)

Model Evaluation Metrics Used: Mean Squared Error (MSE) Mean Absolute Error (MAE) R² Score Median Absolute Error Results & Insights : Random Forest and Gradient Boosting models performed the best with high R² scores and low error metrics.

3. Model Optimization

To improve model performance, applied:

  • Hyperparameter tuning using GridSearchCV.

  • Feature selection using SelectKBest and recursive feature elimination (RFE).

  • Regularization techniques (L1/L2) for regression models.

  • Cross-validation to ensure robustness and generalizability.

    image

4.Provided Business Insights & Impact, Visuals

-Provided key observations, findings in every step, refer to the code file directly to understand further -Look into the slide deck, download the raw file directly for easy access!

5.Technologies Used

  • Python: Data processing and model development.

  • Scikit-learn: Machine learning models and hyperparameter tuning.

  • TensorFlow/Keras: Neural network implementation.

  • Pandas & NumPy: Data manipulation.

  • Matplotlib & Seaborn: Data visualization and many more.

    image

More information available in the code file and attached PPT.

Contact

For inquiries or collaborations, feel free to connect with me on [www.linkedin.com/in/himarohinimallina] or check out more of my work on (https://github.com/z5450851HimaMallina).

Thank you

About

ML-driven analysis of medical insurance claims to predict costs, optimize operations, and enhance decisions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published