This repository contains the base TranAD and USAD models with a custom pipeline built around them for easy training and evaluation and improvements to TranAD.
To set up the environment, I use the conda python package manager, all packages are listed in environment.yml. To create the environment, run:
conda env create -f environment.ymlTo get familiar with everything, I would head over to the notebooks folder and start with parse_datasets.ipynb followed by train_then_evaluate.ipynb.
All model results presented in the paper can be found in the Results directory.
The model checkpoints presented in the paper for TranAD+ can be found here. Please put the two directories, Checkpoints and Pickles in the top-level directory of this repo, and run notebooks/get_best_results.ipynb to print the run results.
The model checkpoints for TranAD can be found here.
SMAP: TheSMAPdataset can be found here.MSL: TheMSLdataset can be found here.SWaT: TheSWaTdataset can be found here.WADI: TheWADIdataset can be found here.SMD: TheSMDdataset can be found here.DSN_1k: TheDSN_1kdataset can be found here.
Datasets raw files should be placed in Datasets/Raw/<dataset_name>. These are then copied into Datasets/Original with preprocess_data.initialize_dataset(). Then, the tracks must be parsed using parse_data.parse_tracks(), saving results in Datasets/EDA. Afterwards, datasets can be preprocessed using preprocess_data.preprocess_data(), storing the preprocessed datasets used for training in Datasets/Preprocessed. A notebook demonstrating this is shown in notebooks/parse_datasets.ipynb.
TranADPlus
├── __init__.py
├── Results
├── Checkpoints
├── Datasets
│ ├── EDA
│ ├── Original
│ ├── Preprocessed
│ └── Raw
├── environment.yml
├── licenses
├── LICENSE
├── notebooks
├── Pickles
├── scripts
├── src
├── test
└── README.md
__init__.py:
- Used to initialize the package, it is particularly important for getting the global config file from
src/global_configto be a single reference across the package. If anyone has a better suggestion I'm all-ears.
Results:
- Runs and results for the other models presented in the paper, saved for transparency. Also contains tables presented in the paper.
Checkpoints:
- Folder generated when running
notebooks/train_then_evaluate.ipynband/orscripts/training_loop.py. Contains model checkpoints.
Datasets:
- Contains all dataset-related information. This includes the
Rawdatasets, theOriginalfolder which has all datasets in a common format, information forEDA, and thePreprocesseddatasets.
environment.yml:
- Conda environment packages and their versions.
licenses:
- Contains the licenses for TranAD+, TranAD, OmniAnomaly, and USAD.
LICENSE:
- Contains the license for TranAD+ (used for GitHub page UI niceness).
notebooks:
- Contains helpful notebooks.
parse-datasets.ipynb: Shows how to initialize datasets.train_then_evaluate.ipynb: Shows how to train and evaluate a model.pot.ipynb: Shows how POT works qualitatively.get_best_results.ipynb: Takes in the finalCheckpointsandPicklesto collect the best F1 score for each (model, dataset) combination.
Pickles:
- Folder generated when running
notebooks/train_then_evaluate.ipynband/orscripts/evaluation.py. Contains the final result pickle files with F1 scores.
scripts:
- Contains training and evaluation scripts (
training_loop.pyandevaluation.py). Also containsclean_data.pywhich can help with removing files from failed runs.
src:
- Contains the
TranADPluspackage.
test:
- Unit tests for the majority of the
TranADPluspackage.
README.md:
- This file
Most function in the src directory have been unit tested. To run the following, make sure you're in the top-most directory for TranADPlus.
Run all tests:
python -m unittest -b
Run specific test:
python -m unittest test.test_print_data.TestPrintData.test_print_nan_info
To use a custom dataset, you need to include an initialization function in preprocess_data.initialize_dataset(). This puts your custom dataset into a standard format and directory structure recognized by the rest of the functions in this package.
- We switch out the feed-forward component of the model to be an inverse bottleneck instead of a bottleneck.
- We re-implement USAD's
$\alpha$ and$\beta$ scaling on the decoder outputs:$\mathcal{A} = \alpha D_1 + \beta D_2$ . The original TranAD paper assigned$\alpha=\beta=0.5$ and the code had$\alpha=0$ ,$\beta=1$ . - Resolved unaddressed issue in TranAD code where first window is repeated exactly.
- Resolved unaddressed issue in TranAD where
SMD,SMAP, andMSLdatasets were trained on a single track instead of the whole dataset. - Training hyperparameters were optimized for better results.
- Improved experiment reproducability by providing model weights and a specific Python environment.
I was also able to get MTAD-GAT and OmniAnomaly up and running, but didn't include them in this paper since their respective GitHub pages have well-written training pipelines. OmniAnomaly was hard to set-up due to using an old verison of TensorFlow. Don't hesitate to reach out if you need some help with those repos.
TranAD+ follows the BSD 3-Clause license. See here.
TranAD follows the BSD 3-Clause license. See here.
USAD follows the BSD 3-Clause license. See here.
OmniAnomaly follows the MIT license. See here.
This repository was created by Alexey Yermakov.
This repository:
@misc{Yermakov2024tranadplus,
author = {Yermakov, Alexey},
title = {TranAD+},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/yyexela/TranADPlus}}
}The original TranAD paper:
@article{tuli2022tranad,
title={{TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data}},
author={Tuli, Shreshth and Casale, Giuliano and Jennings, Nicholas R},
journal={Proceedings of VLDB},
volume={15},
number={6},
pages={1201-1214},
year={2022}
}The original USAD paper:
@inproceedings{audibert2020usad,
title={Usad: Unsupervised anomaly detection on multivariate time series},
author={Audibert, Julien and Michiardi, Pietro and Guyard, Fr{\'e}d{\'e}ric and Marti, S{\'e}bastien and Zuluaga, Maria A},
booktitle={Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery \& data mining},
pages={3395--3404},
year={2020}
}





