Skip to content

๐Ÿš— A dynamic pricing and insurance risk modeling system using Python, XGBoost, SHAP, and DVC. Predicts claim severity and probability, enabling risk-adjusted premium strategies with full reproducibility and CI/CD.

License

Notifications You must be signed in to change notification settings

ayanasamuel8/End-to-End-Insurance-Risk-Analytics-and-Predictive-Modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

19 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš— Insurance Risk Modeling & Dynamic Pricing System

This project develops a robust, explainable, and risk-aware pricing model for auto insurance policies. It incorporates statistical analysis, machine learning, and reproducible data practices to predict insurance claim severity and optimize premium pricing.

โœ… Completed as part of the Week 3 Challenge at 10Academy.


๐Ÿงญ Project Goals

  • Understand and explore insurance data to uncover actionable insights.
  • Establish a reproducible data pipeline using Git, GitHub, and DVC.
  • Statistically validate hypotheses related to insurance risk.
  • Build predictive models to estimate:
    • ๐Ÿ’ฐ Claim Severity โ€” How much we might pay.
    • ๐Ÿ“ˆ Claim Probability โ€” How likely a customer is to claim.
  • Construct a dynamic pricing formula that incorporates business margins.

๐Ÿ”ง Technologies & Tools

Area Tools Used
Programming Python, Jupyter
Data Handling Pandas, NumPy, DVC
Visualization Matplotlib, Seaborn, Plotly
Modeling Scikit-learn, XGBoost, SHAP, LIME
Version Control Git, GitHub, GitHub Actions
CI/CD GitHub Actions
Environment venv + requirements.txt

๐Ÿ“‚ Repository Structure

.
โ”œโ”€โ”€ data/ # Raw and processed data (tracked via DVC)
โ”œโ”€โ”€ models/ # Saved models
โ”œโ”€โ”€ notebooks/ # Jupyter notebooks for EDA, testing, modeling
โ”œโ”€โ”€ src/ # Core source code
โ”‚ โ”œโ”€โ”€ preprocessing/ # Cleaning, transformation, encoding
โ”‚ โ”œโ”€โ”€ task_3/ # Hypothesis testing modules
โ”‚ โ””โ”€โ”€ task_4/ # Modeling pipeline and interpretation
โ”œโ”€โ”€ tests/ # Unit tests
โ”œโ”€โ”€ .dvc/ # DVC metadata
โ”œโ”€โ”€ .github/workflows/ # GitHub Actions CI pipeline
โ”œโ”€โ”€ dvc.yaml # DVC pipeline definition
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ””โ”€โ”€ README.md # Project overview (this file)

๐Ÿ“Š Task Breakdown

๐Ÿ” Task 1: EDA & Git Setup

  • Configured Git and GitHub, created task-1 branch
  • Performed EDA on claims, premiums, and customer demographics
  • Visualized insights across provinces, genders, and vehicle types
  • Identified key drivers of loss ratio and risk

๐Ÿ’พ Task 2: Data Version Control (DVC)

  • Installed DVC and initialized version control
  • Added data files to DVC tracking
  • Set up a local remote storage and pushed data
  • Ensured reproducibility and auditability of datasets

๐Ÿ“Š Task 3: Hypothesis Testing

  • Formulated and tested statistical hypotheses:
    • ๐Ÿ“ Risk varies across provinces and zip codes
    • ๐Ÿ‘ฅ Gender differences in claim frequency and severity
    • ๐Ÿ’ธ Profitability margins vary by region
  • Used t-tests, z-tests, chi-squared where applicable
  • Business interpretations provided for each result

๐Ÿง  Task 4: Predictive Modeling

  • Built severity regression models: Linear, Random Forest, XGBoost
  • Evaluated using RMSE, Rยฒ
  • Used SHAP and LIME for feature importance
  • Modeled claim probability (classification) for pricing
  • Final pricing formula:

๐Ÿ“ˆ Key Insights & Recommendations

Insight Impact on Pricing Strategy
โ‰ค4 Cylinder vehicles โ†’ โ†‘ Severity Risk Apply loading to small-engine vehicles
Non-VAT Registered โ†’ โ†‘ Risk Raise base rate for unregistered customers
Converted/Modified Vehicles = โ†‘ Risk Apply higher risk surcharge
Alarm/Immobilizer โ†’ โ†“ Risk Provide discount for security features
New Vehicles โ†’ โ†“ Risk Discount for newer vehicles

๐Ÿ“ฆ Setup Instructions

  1. Clone the repository:
    git clone https://github.com/ayanasamuel8/End-to-End-Insurance-Risk-Analytics-and-Predictive-Modeling.git
    cd insurance-risk-model

Install dependencies:

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

Run notebooks:

jupyter notebook

Run tests:

pytest

๐Ÿงช Data Versioning with DVC bash dvc init dvc add data/raw/insurance_data.csv dvc remote add -d localstorage /path/to/your/storage dvc push To reproduce the data pipeline:

bash dvc pull โœ… CI/CD GitHub Actions is configured for:

Code linting

Unit tests

Model validation (optional step)

Workflow defined in .github/workflows/deploy.yml.

๐Ÿ“Œ Results Summary ๐Ÿงฎ Best Severity Model: XGBoost RMSE improvement: +ฮ”% vs. baseline Top features: Engine Size, Vehicle Age, Province, Conversion Status

๐Ÿง  Classification Accuracy: ~X% Enables dynamic, fair, and risk-adjusted premium pricing

๐Ÿ‘ฅ Contributors ๐Ÿ‘ค Ayana Samuel Role: Full Data Science Workflow Skills: EDA, DVC, Statistical Testing, Machine Learning, GitOps GitHub: https://github.com/ayanasamuel8/End-to-End-Insurance-Risk-Analytics-and-Predictive-Modeling.git

๐Ÿ“œ License This project is licensed for academic and demonstration use. Contact the author for commercial usage rights.

About

๐Ÿš— A dynamic pricing and insurance risk modeling system using Python, XGBoost, SHAP, and DVC. Predicts claim severity and probability, enabling risk-adjusted premium strategies with full reproducibility and CI/CD.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published