Machine Learning - Medical Insurance Claims Prediction
Refer to attached ML_Models.pptx in this repository to access full business report
This project focuses on leveraging machine learning techniques to enhance business decision-making. By analyzing real-world datasets, the goal is to develop predictive models that provide actionable insights for business operations. The primary objective is to improve model performance, business interpretability, and generalizability to future data.
Organizations increasingly rely on data-driven decision-making to optimize operations and improve customer experience. This project explores the following business case:
- Medical Insurance Claims Prediction: Estimating claim costs based on patient demographics and medical history.
- Insurance Company Dataset: Medical cost prediction dataset sourced from Kaggle.
Implemented and compared multiple machine learning models to optimize prediction accuracy and business interpretability. The following steps were undertaken:
- Handled missing values and performed exploratory data analysis (EDA).
- Feature engineering and selection to improve model relevance.
- Scaled numerical features and encoded categorical variables.
Experimented with different machine learning models, ensuring a balance between accuracy and business applicability.
Regression Models Used: Linear Regression Ridge Regression Lasso Regression Decision Tree Regressor Random Forest Regressor Gradient Boosting Regressor Stacking Regressor Neural Network (NLP Regressor)
Model Evaluation Metrics Used: Mean Squared Error (MSE) Mean Absolute Error (MAE) R² Score Median Absolute Error Results & Insights : Random Forest and Gradient Boosting models performed the best with high R² scores and low error metrics.
To improve model performance, applied:
-
Hyperparameter tuning using
GridSearchCV
. -
Feature selection using
SelectKBest
and recursive feature elimination (RFE). -
Regularization techniques (L1/L2) for regression models.
-
Cross-validation to ensure robustness and generalizability.
-Provided key observations, findings in every step, refer to the code file directly to understand further -Look into the slide deck, download the raw file directly for easy access!
-
Python: Data processing and model development.
-
Scikit-learn: Machine learning models and hyperparameter tuning.
-
TensorFlow/Keras: Neural network implementation.
-
Pandas & NumPy: Data manipulation.
-
Matplotlib & Seaborn: Data visualization and many more.
More information available in the code file and attached PPT.
For inquiries or collaborations, feel free to connect with me on [www.linkedin.com/in/himarohinimallina] or check out more of my work on (https://github.com/z5450851HimaMallina).
Thank you