This is the material for reproducing experiments for the paper "Building
Fixed-Size Spatiotemporal Models for Evolving Data Streams". All experiments
were performed on a Ubuntu 18.04.1 Linux system using Python 3.6.9. Python
package requirements can be installed with pip using
pip3 install -r requirements.txt
Furthermore, for some experiments, git has to be installed.
Code for our proposed method is contained in the tpSDOs directory as a Python
module and is installed by issuing the above command.
-
Change to the
pocdirectory.cd poc -
Remove the file
results.csv, which contains our obtained results.rm results.csv -
Run the proof of concept implementation for several fractions of temporal outliers. Alternatively, you can specify the desired fraction of temporal outliers as script parameter.
python3 run.py -
Results are appended to the
results.csvfile and can be plotted usingpython3 plot.py.
-
Change to the
outlierdirectory.cd outlier -
Download
kddcup.data.gzfrom http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html to theoutlierdirectory and extract it.wget http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data.gz && gzip -d kddcup.data.gz -
Perform features extraction for the KDD Cup'99 dataset, creating the
kddcup.npzfile.python3 kddcup.py -
Run all outlier detection algorithms for KDD Cup'99.
python3 run.py kddcup rshash swknn swrrct loda swlof tpsdose -
Results will be appended to
results.csv. Theresults.csvfile contained in this archive shows our obtained results. Results for our proposed method are namedtpsdose.
-
Change to the directory
outlier.cd outlier -
Download the files
partition1_instances.tar.gz,partition2_instances.tar.gz,partition3_instances.tar.gz,partition4_instances.tar.gz,partition5_instances.tar.gzfrom https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/EBCFKM and place them in theoutlierdirectory. -
Extract the downloaded files to obtain directories
partition1,partition2,partition3,partition4,partition5within theoutlierdirectory.tar xf partition1_instances.tar.gz tar xf partition2_instances.tar.gz tar xf partition3_instances.tar.gz tar xf partition4_instances.tar.gz tar xf partition5_instances.tar.gz -
Clone the https://bitbucket.org/gsudmlab/swan_features/src/master/ repository to
gsudmlab-swan_features. The repository is used for feature extraction.git clone https://bitbucket.org/gsudmlab/swan_features/src/master/ gsudmlab-swan_features -
Ensure you are using the version with commit hash 56eb7cb, which we used for our experiments.
git -C gsudmlab-swan_features checkout 56eb7cb -
Perform feature extraction for the SWAN-SF dataset, creating the
swan.npzfile.python3 swan.py -
Run all outlier detection algorithms for SWAN-SF.
python3 run.py swan rshash swknn swrrct loda swlof tpsdose -
Results will be appended to
results.csv. Theresults.csvfile contained in this archive shows our obtained results. Results for our proposed method are namedtpsdose.
Due to issues related to confidentiality, security and privacy, we unfortunately
cannot make this dataset publicly available. The following steps therefore
apply to an arbitrary network capture file capture.pcap.
For this experiment, an installation of golang is additionally required. For
step 7, additionally an installation of tshark is required.
-
Install go-flows from
https://github.com/CN-TU/go-flows.go get github.com/CN-TU/go-flows/... -
Change to the
m2mdirectory.cd m2m -
Perform flow extraction based on feature specifications in
CAIA.json, creating the filecapture.csvfromcapture.pcap.go-flows run features CAIA.json export csv capture.csv source libpcap capture.pcap -
Process flow information using the proposed algorithm, obtaining the file
results.pickle.python3 process.py -
Plot obtained outlier scores and the amount of sampled data points per day.
python3 plot_scores.py python3 plot_sampling.py -
For each observer, plot the magnitude spectrum of observers, 1h temporal plots and 24h temporal plots into the directories
fts,temporal_1handtemporal_24h, respectively.python3 analyze.py -
For each observer, extract a PCAP file containing network traffic corresponding to the respective observer into the
pcapsdirectory.python3 extract.py
Please note that the used dataset has a size of ~2TB and processing of the data
takes several weeks. To only plot the results we obtained, you can use the
existing results.pickle file and skip to step 6.
-
Change to the
darkspacedirectory. -
Obtain the 'Patch Tuesday' dataset from https://www.caida.org/catalog/datasets/telescope-patch-tuesday_dataset/, and place the
ucsd_network_telescope.anon.*.flowtuple.cors.gzfiles in thedarkspacedirectory. -
Obtain the legacy Corsaro software from https://github.com/CAIDA/corsaro , build the
cors2asciitool and place it in your system's PATH. -
Extract flow information in AGM format.
( for file in ucsd_network_telescope.anon.*.flowtuple.cors.gz ; do cors2ascii $file ; done ) | python3 cors2agm.py >agm.csv -
Process flow information, obtaining the file
results.pickle.python3 process.py -
Perform frequency plots and temporal plots from
results.picklepython3 plot.py -
Plots can be found in the
ftsandtemporal_1wdirectories.