This project presents a research-oriented and interpretable machine learning framework for detecting fraudulent financial transactions in large-scale banking data. Using a dataset of approximately 18 million transactions, the system emphasizes behavioral feature engineering, probabilistic risk estimation, and post-hoc analytical evaluation. An interactive experimental dashboard is provided to support controlled transaction testing, qualitative error analysis, and interpretability-driven inspection. This project is designed as an academic research artifact, suitable for MSc applications and future peer-reviewed publication.
Fraud detection in financial systems presents multiple real-world and research challenges:
- Extreme class imbalance: Fraud cases represent less than 0.2% of the data.
- High cost of false negatives: Leading to direct financial loss.
- Regulatory demand for interpretability: Modern systems require explainable decision-making.
This work focuses on:
- Modeling transactional behavior rather than customer identity.
- Combining statistical learning with domain-driven financial indicators.
- Producing transparent risk outputs instead of opaque binary decisions.
- Source: Financial Transactions Dataset (AIML – Synthetic & Anonymized).
- Scale: ~18,000,000 transactions.
- Fraud Ratio: ~0.17%.
- Transaction Types:
TRANSFER,CASH_OUT,PAYMENT,DEBIT,CASH_IN. - Ethical Note: Contains no Personal Identifiable Information (PII).
The system provides an interactive interface that allows manual transaction simulation, enabling controlled experimentation with transaction attributes.
For each evaluated transaction, the system outputs a Fraud Probability, Aggregated Risk Score, and Behavioral Alerts.
The distribution highlights how the model allocates probability mass under extreme class imbalance.
A log-scaled comparison demonstrates separation trends between legitimate and fraudulent transactions.
Analysis of fraud incidence across different operation types (notably higher in TRANSFER and CASH_OUT).
Highlights relationships between transaction attributes and engineered behavioral features.
This visualization reveals the concentration of predicted risk in specific transaction regimes.
A ranked view of high-risk transactions supports manual audit and qualitative error analysis.
- Model: Logistic Regression (Interpretable baseline).
- Preprocessing: Standardization and One-hot encoding.
- Feature Engineering:
- Balance differentials.
- Amount-to-balance ratios.
- Behavioral risk flags.
This project demonstrates applied machine learning in a realistic financial setting, focusing on:
- Handling highly imbalanced datasets.
- Behavioral feature engineering grounded in domain logic.
- Reproducible and inspectable experimental analysis.
Suitable As: MSc research portfolio material or a foundation for peer-reviewed research.
- Temporal modeling: Using sequence-based approaches (LSTM / Transformers).
- Explainable Ensembles: Integrating SHAP or LIME with Gradient Boosting.
- Real-time deployment: Implementing streaming inference pipelines.
Mariam Zakaria MSc Applicant — Machine Learning & Data Science Research Interests: Fraud Detection, Interpretable Machine Learning, Applied AI Systems.









