Skip to content

yyexela/TranADPlus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TranAD+

This repository contains the base TranAD and USAD models with a custom pipeline built around them for easy training and evaluation and improvements to TranAD.

Results

TranAD+ Comparison On DSN_1k TranAD+ Comparison On SMAP TranAD+ Comparison On MSL TranAD+ Comparison On SWaT TranAD+ Comparison On SMD TranAD+ Comparison On WADI

Getting Started

To set up the environment, I use the conda python package manager, all packages are listed in environment.yml. To create the environment, run:

conda env create -f environment.yml

To get familiar with everything, I would head over to the notebooks folder and start with parse_datasets.ipynb followed by train_then_evaluate.ipynb.

Model Checkpoints

All model results presented in the paper can be found in the Results directory.

The model checkpoints presented in the paper for TranAD+ can be found here. Please put the two directories, Checkpoints and Pickles in the top-level directory of this repo, and run notebooks/get_best_results.ipynb to print the run results.

The model checkpoints for TranAD can be found here.

Datasets

  • SMAP: The SMAP dataset can be found here.
  • MSL: The MSL dataset can be found here.
  • SWaT: The SWaT dataset can be found here.
  • WADI: The WADI dataset can be found here.
  • SMD: The SMD dataset can be found here.
  • DSN_1k: The DSN_1k dataset can be found here.

Datasets raw files should be placed in Datasets/Raw/<dataset_name>. These are then copied into Datasets/Original with preprocess_data.initialize_dataset(). Then, the tracks must be parsed using parse_data.parse_tracks(), saving results in Datasets/EDA. Afterwards, datasets can be preprocessed using preprocess_data.preprocess_data(), storing the preprocessed datasets used for training in Datasets/Preprocessed. A notebook demonstrating this is shown in notebooks/parse_datasets.ipynb.

Dataset Comparison Table

Directory Structure

TranADPlus
├── __init__.py
├── Results
├── Checkpoints
├── Datasets
│   ├── EDA
│   ├── Original
│   ├── Preprocessed
│   └── Raw
├── environment.yml
├── licenses
├── LICENSE
├── notebooks
├── Pickles
├── scripts
├── src
├── test
└── README.md

__init__.py:

  • Used to initialize the package, it is particularly important for getting the global config file from src/global_config to be a single reference across the package. If anyone has a better suggestion I'm all-ears.

Results:

  • Runs and results for the other models presented in the paper, saved for transparency. Also contains tables presented in the paper.

Checkpoints:

  • Folder generated when running notebooks/train_then_evaluate.ipynb and/or scripts/training_loop.py. Contains model checkpoints.

Datasets:

  • Contains all dataset-related information. This includes the Raw datasets, the Original folder which has all datasets in a common format, information for EDA, and the Preprocessed datasets.

environment.yml:

  • Conda environment packages and their versions.

licenses:

  • Contains the licenses for TranAD+, TranAD, OmniAnomaly, and USAD.

LICENSE:

  • Contains the license for TranAD+ (used for GitHub page UI niceness).

notebooks:

  • Contains helpful notebooks.
  • parse-datasets.ipynb: Shows how to initialize datasets.
  • train_then_evaluate.ipynb: Shows how to train and evaluate a model.
  • pot.ipynb: Shows how POT works qualitatively.
  • get_best_results.ipynb: Takes in the final Checkpoints and Pickles to collect the best F1 score for each (model, dataset) combination.

Pickles:

  • Folder generated when running notebooks/train_then_evaluate.ipynb and/or scripts/evaluation.py. Contains the final result pickle files with F1 scores.

scripts:

  • Contains training and evaluation scripts (training_loop.py and evaluation.py). Also contains clean_data.py which can help with removing files from failed runs.

src:

  • Contains the TranADPlus package.

test:

  • Unit tests for the majority of the TranADPlus package.

README.md:

  • This file

Unit Testing

Most function in the src directory have been unit tested. To run the following, make sure you're in the top-most directory for TranADPlus.

Run all tests:

  • python -m unittest -b

Run specific test:

  • python -m unittest test.test_print_data.TestPrintData.test_print_nan_info

Using custom datasets

To use a custom dataset, you need to include an initialization function in preprocess_data.initialize_dataset(). This puts your custom dataset into a standard format and directory structure recognized by the rest of the functions in this package.

Changes from TranAD

  • We switch out the feed-forward component of the model to be an inverse bottleneck instead of a bottleneck.
  • We re-implement USAD's $\alpha$ and $\beta$ scaling on the decoder outputs: $\mathcal{A} = \alpha D_1 + \beta D_2$. The original TranAD paper assigned $\alpha=\beta=0.5$ and the code had $\alpha=0$, $\beta=1$.
  • Resolved unaddressed issue in TranAD code where first window is repeated exactly.
  • Resolved unaddressed issue in TranAD where SMD, SMAP, and MSL datasets were trained on a single track instead of the whole dataset.
  • Training hyperparameters were optimized for better results.
  • Improved experiment reproducability by providing model weights and a specific Python environment.

Other models

I was also able to get MTAD-GAT and OmniAnomaly up and running, but didn't include them in this paper since their respective GitHub pages have well-written training pipelines. OmniAnomaly was hard to set-up due to using an old verison of TensorFlow. Don't hesitate to reach out if you need some help with those repos.

Licenses

TranAD+ follows the BSD 3-Clause license. See here.
TranAD follows the BSD 3-Clause license. See here.
USAD follows the BSD 3-Clause license. See here.
OmniAnomaly follows the MIT license. See here.

Contributors

This repository was created by Alexey Yermakov.

Citing

This repository:

@misc{Yermakov2024tranadplus,
  author = {Yermakov, Alexey},
  title = {TranAD+},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/yyexela/TranADPlus}}
}

The original TranAD paper:

@article{tuli2022tranad,
  title={{TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data}},
  author={Tuli, Shreshth and Casale, Giuliano and Jennings, Nicholas R},
  journal={Proceedings of VLDB},
  volume={15},
  number={6},
  pages={1201-1214},
  year={2022}
}

The original USAD paper:

@inproceedings{audibert2020usad,
  title={Usad: Unsupervised anomaly detection on multivariate time series},
  author={Audibert, Julien and Michiardi, Pietro and Guyard, Fr{\'e}d{\'e}ric and Marti, S{\'e}bastien and Zuluaga, Maria A},
  booktitle={Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery \& data mining},
  pages={3395--3404},
  year={2020}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published