This is a case study that investigates payment defaults and seeks to answer the business question:
Payment defaults are detrimental to the business and are a significant cost factor. Are there any key trends in the data which can help avoid default-prone customers in the future?
Clients.csv,Payments.csv, and other CSV exports: raw data used for analysis.- Jupyter notebooks used in final analysis:
ExploratoryAnalysis.ipynb- data exploration, summary statistics, visualizations, and initial insights.LogBinomial.ipynb- log-binomial modelling alternatives and related diagnostics.Notes.ipynb- analysis notes and observations.Aaron Galligan Case Study PowerPoint.pptx- A slide deck of findings with a target audience of stakeholders for the business in question.
CaseStudy/- copies of the main notebooks for archival purposes.
Identify trends and predictors of payment default using the provided client and payments data. The goal is to surface actionable signals that help reduce future defaults by flagging higher-risk customers or informing changes to underwriting, pricing, or collection processes.
Key data files used in the analysis:
Clients.csv— client-level attributes (demographics, entity type, etc.).Payments.csv— transaction and payment histories, including default flags or indicators.
Note: exported CSVs under Exported CSV's/ include useful aggregates such as percentage of defaults by entity type.
- Data cleaning and merging: handled missing values, normalized columns, and joined client and payment records to build an analysis dataset.
- Exploratory data analysis (EDA): computed default rates across categories (e.g., entity type), visualized distributions, and examined temporal trends.
- Modeling: trained logistic and log-binomial models to estimate relationships between predictors and default probability. Evaluated model performance with AUC/ROC, confusion matrices, and calibration checks.
- Diagnostics and interpretation: inspected coefficients, marginal effects, and partial dependence plots to identify strong predictors.
The notebooks contain full results and figures and high-level takeaways such as:
- Certain entity types and client segments show higher default rates (see
Exported CSV's/percentage of defaults by entity type.csv). - Behavioral/payment history signals (late payments, missed payments, or erratic payment patterns) are strong predictors of future default.
- Features related to client lifecycle (newer accounts vs established ones), aggregated balances, and prior delinquencies also increase default risk.
- Logistic and log-binomial models provide similar directional results; model performance will vary with feature engineering and sampling choices.
- Use a risk-scoring model (e.g., logistic regression) in pre-screening to flag high-risk applicants. Periodically retrain with fresh data.
- Enrich models with payment-behavior features: time since last payment, frequency of late payments, and changes in payment amounts.
- Consider differentiated terms or higher deposits for higher-risk segments, and targeted collections strategies.
- Monitor metrics (default rate, model AUC, population stability) and run A/B tests before rolling out changes.
Prerequisites:
- Python 3.8+ (recommended)
- Jupyter or JupyterLab
- Common data science libraries: pandas, numpy, scikit-learn, statsmodels, matplotlib, seaborn
- Feature engineering: create more temporal- and sequence-based features from payment histories.
- Advanced models: try gradient boosting or ensemble models and compare against logistic baselines.
- Deployment: wrap the scoring model in a simple service or batch job for periodic scoring.