This project focuses on an end-to-end insurance risk analytics and predictive modeling solution for AlphaCare Insurance Solutions (ACIS) in South Africa. As a marketing analytics engineer, the primary objective is to transform traditional pricing models into a dynamic, risk-based system by leveraging data-driven insights.
The project encompasses three key phases:
- Exploratory Data Analysis (EDA): A thorough investigation into ACIS's historical car insurance claim data to understand data quality, distributions, temporal trends, and segment-wise profitability. This phase lays the foundation for all subsequent analytical steps.
- A/B Hypothesis Testing: Statistical validation of key business assumptions about risk drivers and profit margins across different segments (e.g., provinces, zip codes, demographics). This phase provides data-backed evidence for refining segmentation strategies.
- Predictive Modeling for Risk and Premium Optimization: Development and evaluation of machine learning models to:
- Predict Claim Severity: Forecast the financial cost of a claim if it occurs.
- Predict Claim Probability: Determine the likelihood of a policy resulting in a claim.
- Direct Premium Prediction: Optimize the calculation of appropriate premiums.
By integrating advanced analytics with business objectives, this project aims to identify low-risk customer segments, optimize marketing strategies, reduce premiums where appropriate, attract new clients, and ultimately enhance ACIS's profitability and competitive advantage.
- Python 3.8+ (recommended)
- Git
-
Clone the repository:
git clone https://github.com/michaWorku/Insurance-Risk-Analytics-and-Predictive-Modeling.git cd Insurance-Risk-Analytics-and-Predictive-ModelingIf you created the project in the current directory:
# Already in the project root -
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: .\venv\Scripts\activate -
Install dependencies:
pip install -r requirements.txt
This project is primarily driven through Jupyter notebooks located in the notebooks/ directory, which execute the modular Python scripts in src/. Follow the notebooks in sequential order for a comprehensive understanding and reproduction of the analysis:
-
Run EDA Notebook:
- Navigate to the
notebooks/directory. - Open
EDA.ipynbto perform exploratory data analysis, data quality checks, and initial summarization. - Execute all cells in the notebook. This will also preprocess and save the data to
data/processed/.
# From project root: # jupyter notebook notebooks/EDA.ipynb - Navigate to the
-
Run Hypothesis Testing Notebook:
- Ensure
EDA.ipynbhas been run first to generate the processed data. - Open
Hypothesis_Testing.ipynbto conduct A/B hypothesis tests on key risk drivers and profit margins. - Execute all cells.
# From project root: # jupyter notebook notebooks/Hypothesis_Testing.ipynb - Ensure
-
Run Predictive Modeling Notebook:
- Ensure
EDA.ipynbhas been run first. - Open
Predictive_Modeling.ipynbto build, train, evaluate, and interpret machine learning models for claim severity, claim probability, and premium prediction. - Execute all cells.
# From project root: # jupyter notebook notebooks/Predictive_Modeling.ipynb - Ensure
For deeper insights, explore the src/ directory which contains the modularized Python code for data loading, preprocessing, analysis strategies, and model implementations.
.
├── .vscode/ # VSCode specific settings
├── .github/ # GitHub specific configurations (e.g., Workflows)
│ └── workflows/
│ └── unittests.yml # CI/CD workflow for tests and linting
├── .gitignore # Specifies intentionally untracked files to ignore
├── requirements.txt # Python dependencies
├── pyproject.toml # Modern Python packaging configuration (PEP 517/621)
├── README.md # Project overview, installation, usage
├── Makefile # Common development tasks (setup, test, lint, clean)
├── .env # Environment variables (e.g., API keys - kept out of Git)
├── src/ # Core source code for the project
│ ├── __init__.py
│ ├── core/ # Core logic/components
│ ├── models/ # Data models, ORM definitions, ML models
│ ├── utils/ # Utility functions, helper classes
│ └── services/ # Business logic, service layer
├── tests/ # Test suite (unit, integration)
│ ├── unit/ # Unit tests for individual components
│ └── integration/ # Integration tests for combined components
├── notebooks/ # Jupyter notebooks for experimentation, EDA, prototyping
├── scripts/ # Standalone utility scripts (e.g., data processing, deployment)
├── docs/ # Project documentation (e.g., Sphinx docs)
├── data/ # Data storage (raw, processed)
│ ├── raw/ # Original, immutable raw data
│ └── processed/ # Cleaned, transformed, or feature-engineered data
├── config/ # Configuration files
└── examples/ # Example usage of the project components
Guidelines for contributing to the project.
This project is licensed under the MIT License. (Create a LICENSE file if you want to use MIT)