[Accepted at ICLR2025] FAR-FD: Feature Augmented Retrieval for Fraud Detection

Official implementation of "FAR-FD: Feature Augmented Retrieval for Fraud Detection"

Abstract: Fraud detection plays a crucial role in the financial industry, preventing significant financial losses. Traditional rule-based systems and manual audits often struggle with the evolving nature of fraud schemes and the vast volume of transactions. While traditional machine learning and deep learning methods have made headway, significant room for improvement remains with problems such as class imbalance, high feature cardinality and adversarial dynamics. To address these limitations, we propose FAR-FD, the first work to integrate a subset of important features in Retrieval Augmented Classification (RAC), and the second work to use RAC for fraud detection. Our model utilises a pre-trained SAINT encoder, a self-supervised learning method, comprising of retrieval, integration, and predictor modules, jointly trained to dynamically leverage similar instances for each input sample. This approach not only enables the model to utilize the context of similar fraud patterns but uniquely positions it for real-time fraud detection by maintaining an external database that can be continuously updated as sophisticated fraud patterns emerge without requiring model retraining. We validate the effectiveness of FAR-FD through extensive experiments on a large scale real-world dataset and achieve state-of-the-art performance in detecting fraudulent activities. Our code is available at https://github.com/annimukherjee/FAR-FD.

TLDR; Using few important features as context in a retreival augmented classification system to predicit fraud.

System Diagram:

Experimental Results

Steps to clone and run

Clone the repository.
Download the IEEE CIS Dataset from Kaggle (link) and save it in a new directory called datasets/ieee-fraud-detection-datasets/{train/test}. Move the test_identity.csv and test_transaction.csv into the datasets/ieee-fraud-detection-datasets/test and the train_identity.csv and train_transaction.csv and move it to datasets/ieee-fraud-detection-datasets/train.
Run 0_combine_ieee_cis.ipynb in 0_pre-process-data with the Kaggle dataset. This will generate two dataframes in the datasets/ieee-processed directory named ieee-train-merged.csv and ieee-train-merged_imputed_cleaned.csv. We will use ieee-train-merged_imputed_cleaned.csv for the rest of the experiments.
Now, we have to encode our dataset. We use the SAINT encoder published in this paper (official code). We use the unofficial SAINT implementation for our experiments. Using the generated ieee-train-merged_imputed_cleaned.csv run the 0_ieee-preprocess-dataset-split-saint.ipynb notebook.
Then run dataset_split.ipynb to obtain the train, validation and test splits.

📂 Access Preprocessed Datasets

We provide the preprocessed datasets used in our experiments. You can download them from the following link:

🔗 Preprocessed Datasets (Google Drive)

📂 Access Original Dataset

https://www.kaggle.com/competitions/ieee-fraud-detection/data

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
0_pre-process-data		0_pre-process-data
1_saint		1_saint
3_experiments		3_experiments
4_sota		4_sota
5_embeddings		5_embeddings
resources		resources
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[Accepted at ICLR2025] FAR-FD: Feature Augmented Retrieval for Fraud Detection

System Diagram:

Experimental Results

Steps to clone and run

📂 Access Preprocessed Datasets

🔗 Preprocessed Datasets (Google Drive)

📂 Access Original Dataset

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

annimukherjee/FAR-FD

Folders and files

Latest commit

History

Repository files navigation

[Accepted at ICLR2025] FAR-FD: Feature Augmented Retrieval for Fraud Detection

System Diagram:

Experimental Results

Steps to clone and run

📂 Access Preprocessed Datasets

🔗 Preprocessed Datasets (Google Drive)

📂 Access Original Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages