Network Monitoring and Attack Detection - Master Thesis
Nicolas Kaenzig, D-ITET, ETH-Zürich
Tutors: Roland Meier, Luca Gambazzi, Vincent Lenders
Supervisor: Prof. Laurent Vanbever
Due to the rapid increase of sophisticated attacks on computer systems and networks, the field of network anomaly and intrusion detection has become a center of intense research during the past decades. The analysis of network traffic on a large scale and identifying suspicious patterns in the data has proven to be a major challenge.
In this thesis we investigate methods for performing effective analysis of network traffic and evaluate machine-learning based techniques for automated detection of malicious activities. We perform the analysis on a large set of raw network data originating from past cyber defense exercises (Locked Shields).
First, we apply common tools for network traffic analysis and intrusion detection such as Wireshark, Bro and Snort to the data. We then use the information extracted by these tools to build up an extensive database (Elasticsearch), which enables powerful ways for analysis and visualization of the data. In addition, we label connections between compromised hosts and C&C servers that are under control of the attacker team, using information sources provided by the organizers of the Locked Shields exercise.
In the second part of the thesis we investigate possible machine-learning based applications for intrusion and anomaly detection to provide the defenders with an additional monitoring tool during the exercise. We train supervised machine-learning models that can predict sessions established to malicious C&C servers with high precision. We then conduct a thorough analysis of the model robustness by simulating adversarial inputs and packet loss, while proposing possible methods to increase the model's resilience.
Finally, we evaluate unsupervised clustering approaches first on the novel intrustion detection dataset CICIDS2017 where the malicious traffic is completely labelled, facilitating the validation of the developed models. Going back to the Locked Shields data we proof the feasibility of unsupervised methods for intrusion detection on this data by reporting high detection rates of the C&C sessions mentioned above.
/sourcecode: Contains the complete sourcecode written throughout this thesis
- bro_main.py: Main script used for analysing the Bro .log-files.
- bro_misc.py: Helper functios used for analysing the Bro .log-files.
- bro_parsers.py: Functions used for parsing the Bro .log-files.
- cs_report_parser.py: Functions used for parsing the Cobalt Strike Operation Notes report to extract the information needed for the "session-labelling".
- csv_labeller.py: Functions used to label the .csv files holding the features generated by the FLowMeter and KDD feature-extraction tools
- display_and_filter_connection_csv.ipynb: Jupyter notebook to used analyse the extracted data from the Bro logs (bro_main.py)
- snort_alert_csv_analysis.ipynb: Jupyter notebook to used analyse the Snort alert logs
- elasticsearch_indexing.py: Functions used to index the pcap (.json), Bro and Snort data into Elasticsearch.
- FlowMeter-sources.zip: Contains all sourcefiles of the modified version of the FlowMeter tool used in this thesis.
- hbos.py: Implementation of HBOS outlier detection algorithm (source: https://github.com/Kanatoko/HBOS-python)
- ml_clustering.py: Functions used to perform the unsupervised clustering experiments on the CICIDS2017 and the LS datasets.
- ml_clustering_helpers.py: Helper functions used to perform the unsupervised clustering experiments on the CICIDS2017 and the LS datasets.
- ml_data.py: This file contains data such as list of feature names or mappings that we used for the machine learning part of the thesis.
- ml_feature_selection.py: Contains functions used in the feature-selection process of the supervised-learning part of the thesis.
- ml_helpers.py: This module contains several helper functions used in the machine learning part of the thesis.
- ml_supervised.py: This module contains the functions we used to perform the supervised experiments to detect C&C sessions
- ml_training.py: This module contains helper functions used for training our supervised models.
- pcap_functions.py: This module contains different functions to process pcap files.
- sslAnalyzer.bro: Bro script that generates a comprehensive list of all SSL events observed on the wire, containing the corresponding IP-addresses, ports and host-names. (Implemented by Roland Meier)
- /data/
< host_train_test_IP_splits_2905.pickle: File containing the random splits of the malicious IP list, used for generating the 10 training and validation sets for Step-1 of the model-selection process
< local_hosts.csv: This file contains all IP aliases (IPv4, IPv6) and MAC addresses for all local host in the LS17 network
< ls17_rfe_features_sorted_by_importance.txt: This file contains a list of all FlowMeter features ordered by importance.
- /extracted/
< malicious_ips.json: List of all malicious IPs as extracted from the cobalt strike reports (used for "host-labelling")
< malicious_sessions.json: Contains start- and end-times of all cobalt strike sessions, as well as the source and destination IPs of these sessions (used for "session-labelling")
< mapping_dicts.pickle: contains different mappings such as IP->Domain, IP->MAC mappings
- /cobalt_strike/
< ls17_mal_ips.txt: List of IP addresses listed in LS17 cobalt strike reports
< ls18_mal_ips.txt: List of IP addresses listed in LS18 cobalt strike reports
< ls17_mal_domains.txt: List of domains listed in LS17 cobalt strike reports
< ls18_mal_domains.txt: List of domains listed in LS18 cobalt strike reports
< team7CH_opnotes_proc.txt: Textfile generated from Operation Notes report, which we parsed to extract the information for labeling the data by sessions
/FlowMeter:
- Compiled version of the modified FlowMeter tool. Can be run with Java SE Runtime Environment 8
/Elasticsearch:
- export.json: Contains the complete Kibana configurations (Dashboard, Searches, Visualizations). This file was generated using Kibana's "export everything" option.
- pcap-ingest-pipeline.json: Ingest pipeline to rename ipv4, ipv6 and tcp/udp port fields for the PCAP index
- /mappings/
< Contains the Elasticsearch mappings in .json format we used to index the PCAP, Bro and Snort data