This is inspired by JP Morgan Cybersecurity Study Case via Forage. The dataset named 'transaction.csv can be found in Kaggle.
You are a cybersecurity analyst at the one of the largest financial companies in the world. Your job is to analyze a large dataset of fraud in Financial Payment Services. The dataset has five types of transactions:
- CASH-IN is any deposit.
- CASH-OUT is any withdrawal.
- DEBIT is a specific type of withdrawal in which the money is sent to the user’s bank account.
- PAYMENT is the purchase of goods or services.
- TRANSFER involves moving money from one user’s account to another user’s account.
- Read the dataset (
transactions.csv
) as a Pandas dataframe. Note that the first row of the CSV contains the column names. - Return the column names as a list from the dataframe.
- Return the first k rows from the dataframe.
- Return a random sample of k rows from the dataframe.
- Return the Origin account balance delta v. Destination account balance delta scatter plot for Cash Out transactions (Source Delta & Delta Destination).
- Return Fraud transactions that are flagged as frauds and how many of them are real frauds.
- Transaction types bar chart:
- Transaction types frequencies:
- Delta Source:
- Fraud Detection:
- I would like to use free open source Facebook, Twitter, etc. scrappers, to gather the data and put them into csv extension file.
- More analysis can be conducted for this type of report.