This repository contains the code for paper "METER: A Dynamic Concept Adaptation Framework for Online Anomaly Detection".
The overall structure of METER is shown in the figure below. The main intuition is that the evolving stream data with different concepts should be identified and measured in for dynamic model evolution. To achieve this goal, we propose four modules, respectively called Static Concept-aware Detector, Dynamic Shift-aware Detector, Intelligent Switching Controller and Offline Updating Strategy.
(a) Static Concept-aware Detector (SCD) is first trained on historical data to model the central concepts. (b) Intelligent Evolution Controller (IEC) timely measures the concept uncertainty to determine the necessity of dynamic model evolution. (c) Dynamic Shift-aware Detector (DSD) dynamically updates SCD with the instance-aware parameter shift by considering the concept drift. (d) Offline Updating Strategy (OUS) introduces an effective framework updating strategy according to the accumulated concept uncertainty given a sliding window.
pytorch == 1.5.1
python == 3.7.6
numpy == 1.21.5
scipy == 1.4.1
sklearn == 0.0
pandas == 1.0.1
hypnettorch == 0.0.4
edl-pytorch == 0.0.2
We select 14 real-world benchmark datasets from various domains that exhibit different types of concept drift, dimensions, number of data points, and anomaly rates. Four additional synthetic datasets were chosen to simulate different types and durations of concept drift according to the settings in paper. The statistics of the datasets are summarised in the Table below.
- Real-world datasets: (1) Anomaly detection datasets from the UCI repository and ODDS library, namely Ionosphere (Ion.), Pima, Satellite, Mammography (Mamm.). (2) A large public dataset BGL dataset, consisting of log messages collected from a BlueGene/L supercomputer system at Lawrence Livermore National Labs. To facilitate analysis, each log message is processed into the structured data format. (3) Multi-aspect datasets of intrusion detection, namely KDDCUP99 (KDD99) and NSL-KDD (NSL). (4) Time-series datasets, namely NYC taxicab (NYC), CPU utilization (CPU), Machine temperature (M.T.) and Ambient temperature (A.T.) from the Numenta anomaly detection benchmark (NAB). (5) Real-world streaming datasets INSECTS.
- Synthetic datasets: Four synthetic datasets are created for simulating complex anomaly detection scenarios or data streams. It randomly sets categories as anomaly targets to simulate concepts and sets the duration of each concept randomly to simulate two types of concept drift: "abrupt and recurrent" and "gradual and recurrent".
We write both training and evaluation process in the main.py, execute the following command to see the training and evaluation results.
python main.py --dataset ionosphere --epochs 1000 --train_rate 0.2 --mode hybrid+edl --thres_rate 0.1
- mode: type of model, one of ["static", "dynamic", "hybrid", "hybrid+edl"]
- emb_rate: rate of embedding dim to input dim, default=2
- train_rate: rate of training set, default=0.1
- epochs: number of epochs, default=1000
- thres_rate: threshold rate for the pseudo labels from SCD, default=0.05
- uncertainty_threshold: threshold of the concept uncertainty, default=0.1
- uncertainty_avg_threshold: threshold of the offline updating strategy, default=0.1
This project is based on the following open-source projects. We thank their authors for making the source code publicly available.