Interpretable Machine Learning for Fraud Detection in Large-Scale Financial Transactions

Abstract

This project presents a research-oriented and interpretable machine learning framework for detecting fraudulent financial transactions in large-scale banking data. Using a dataset of approximately 18 million transactions, the system emphasizes behavioral feature engineering, probabilistic risk estimation, and post-hoc analytical evaluation. An interactive experimental dashboard is provided to support controlled transaction testing, qualitative error analysis, and interpretability-driven inspection. This project is designed as an academic research artifact, suitable for MSc applications and future peer-reviewed publication.

1. Research Motivation

Fraud detection in financial systems presents multiple real-world and research challenges:

Extreme class imbalance: Fraud cases represent less than 0.2% of the data.
High cost of false negatives: Leading to direct financial loss.
Regulatory demand for interpretability: Modern systems require explainable decision-making.

This work focuses on:

Modeling transactional behavior rather than customer identity.
Combining statistical learning with domain-driven financial indicators.
Producing transparent risk outputs instead of opaque binary decisions.

2. Dataset Description

Source: Financial Transactions Dataset (AIML – Synthetic & Anonymized).
Scale: ~18,000,000 transactions.
Fraud Ratio: ~0.17%.
Transaction Types: TRANSFER, CASH_OUT, PAYMENT, DEBIT, CASH_IN.
Ethical Note: Contains no Personal Identifiable Information (PII).

3. System Overview & Interactive Evaluation

3.1 Transaction Input Interface

The system provides an interactive interface that allows manual transaction simulation, enabling controlled experimentation with transaction attributes.

3.2 Prediction Output & Risk Interpretation

For each evaluated transaction, the system outputs a Fraud Probability, Aggregated Risk Score, and Behavioral Alerts.

4. Exploratory Data & Model Behavior Analysis

4.1 Fraud Probability Distribution

The distribution highlights how the model allocates probability mass under extreme class imbalance.

4.2 Transaction Amount vs Fraud Label

A log-scaled comparison demonstrates separation trends between legitimate and fraudulent transactions.

4.3 Transaction Type Analysis

Analysis of fraud incidence across different operation types (notably higher in TRANSFER and CASH_OUT).

4.4 Feature Correlation Analysis

Highlights relationships between transaction attributes and engineered behavioral features.

5. Model Behavior & Error Analysis

5.1 Amount vs Predicted Fraud Probability

This visualization reveals the concentration of predicted risk in specific transaction regimes.

5.2 Top Suspicious Transactions

A ranked view of high-risk transactions supports manual audit and qualitative error analysis.

6. Methodology Summary

Model: Logistic Regression (Interpretable baseline).
Preprocessing: Standardization and One-hot encoding.
Feature Engineering:
- Balance differentials.
- Amount-to-balance ratios.
- Behavioral risk flags.

7. Research Value

This project demonstrates applied machine learning in a realistic financial setting, focusing on:

Handling highly imbalanced datasets.
Behavioral feature engineering grounded in domain logic.
Reproducible and inspectable experimental analysis.

Suitable As: MSc research portfolio material or a foundation for peer-reviewed research.

8. Limitations & Future Work

Temporal modeling: Using sequence-based approaches (LSTM / Transformers).
Explainable Ensembles: Integrating SHAP or LIME with Gradient Boosting.
Real-time deployment: Implementing streaming inference pipelines.

Author

Mariam Zakaria MSc Applicant — Machine Learning & Data Science Research Interests: Fraud Detection, Interpretable Machine Learning, Applied AI Systems.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
notebooks		notebooks
photos		photos
.gitignore		.gitignore
README.md		README.md
SETUP.md		SETUP.md
fraud_app.py		fraud_app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interpretable Machine Learning for Fraud Detection in Large-Scale Financial Transactions

Abstract

1. Research Motivation

2. Dataset Description