🏦 Credit-risk-scorecard

A step-by-step notebook that engineers a proxy “bad-client” label, performs feature-rich EDA, and builds a leakage-free XGBoost credit scorecard with cross-validation and threshold tuning on 438 k loan-applicant records.

🎓 Project Overview

Raw CSV → missing-value audit
Days-to-years conversion + weighted heuristic risk score
EDA (KDEs, lollipops, correlation, Chi-Square)
One-hot encoding → 50 numeric features
Baseline Logistic Regression (AUC 0.98)
Leakage demo + fix → XGBoost (AUC 0.9997)
Precision–Recall tuning for business cut-offs

❓ Problem Statement

Goal: classify applicants as high-risk or low-risk when no historical default flag exists, using only profile data (income, dependents, occupation, etc.).

🔍 Methodology

Stage	Detail
Feature Eng.	Convert day counts to years; build 5-rule risk score
Proxy Label	`BAD_CLIENT = 1` if `risk_score > 0.20`
EDA	KDE + lollipop plots to visualise class separation
Stats	Pearson heat-map, Chi-Square on categoricals
Pre-proc	Drop ID, impute, binary map, one-hot (50 features)
Baseline	Class-weighted Logistic Regression
Tree Model	XGBoost + `scale_pos_weight`, 5-fold CV
Threshold	PR curve to pick 0.30 vs 0.70 cut-offs

📊 Key Results

Metric	Logistic Reg.	XGBoost (Refined)
ROC AUC	0.9772	0.9997
Precision (Bad)	0.1085	0.9738
Recall (Bad)	0.9456	1.0000
F1 (Bad)	0.1947	0.9867

Why almost perfect? The label is deterministic; a tree can rediscover the rules exactly.
Swap in real repayment data and retrain for realistic scores.

⚖️ Business Cut-offs

Threshold	Precision	Recall	When to Use
0.30	0.94	1.00	Risk-averse – catch every defaulter, accept more manual reviews
0.70	0.99	0.997	Cost-focused – minimise false positives, tolerate a ~0.3 % miss-rate

Data Access

PS: Please access data in this link here: link: https://drive.google.com/file/d/1WeIIugqArR0fdVt7Wpcyj_3KmD4h8Jzb/view?usp=sharing

👨‍💻 Author

Shawn Waringu
Data Scientist & Analyst

LinkedIn GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
credit_scorecard_pipeline.ipynb		credit_scorecard_pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏦 Credit-risk-scorecard

🎓 Project Overview

❓ Problem Statement

🔍 Methodology

📊 Key Results

⚖️ Business Cut-offs

Data Access

👨‍💻 Author

About

Uh oh!

Releases

Packages

Languages

ShawnyQ/credit-risk-scorecard

Folders and files

Latest commit

History

Repository files navigation

🏦 Credit-risk-scorecard

🎓 Project Overview

❓ Problem Statement

🔍 Methodology

📊 Key Results

⚖️ Business Cut-offs

Data Access

👨‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages