As presented @ai_village_dc at DEF CON China and DEF CON 26. Paper in progress
This code demonstrates POC attacks on neural networks at the system level. We have trained models for various problems and demonstrated that we can alter the weight and bias parameters of these models in live memory. We also present methods to retrain the networks to introduce sparse trojans which do not dicernably hurt model performance, but cause trojan behavior on select inputs.
This repo contains the following directories:
This directory contains the code to train and re-train a simple MNIST classification model with a trojan.
- model.py - Contains the code to define the base MNIST model (with and without L0 regularization) and train it.
- sparsity.py - Contains a function check_sparsity() which takes a dictionary of arrays containing the changes to weight parameters and computes and returns the sparsity.
- l0_regularization.py - Contains the implementation of L0 regularization from [https://arxiv.org/abs/1712.01312]
- trojan.py - Contains an implementation of some of the techniques described in [https://docs.lib.purdue.edu/cstech/1781/], including code to synthesize new training data, as mentioned in the appendix.
- train_sparse_update.py - Contains code to construct a poisoned dataset, and implement retraining with both the top-k gradient masking and L0 regularization approaches. When run from the command line, runs multiple trials with various hyperparameters and saves results to csv.
- model.py - Contains the code to define the base PDF classifier model (with and without L0 regularization) and train it.
- trojan.py - See MNIST
- train_sparse_update.py - See MNIST
- logs/example - Contains checkpoint files for the baseline model.
- load_model.py, patch_weights.py - Used to test trojaning of the model.
A trivial implememntation of an XOR network in Tensorflow
This directory contains a simple C++ neural network framework we wrote.
It only performs forward propogation.
We provide a main
driver which takes in a json file specifying the network architecture as the first argument.
Contains the malware for attacking models written in Tensorflow and the ToyNN framework
We demonstrate the attacks on linux and windows.
The models we present attacks for are:
-
XOR with ToyNN (
simple_model.json
) on Windows -
XOR with Tensorflow on Windows
-
PDF with Tensorflow on Windows
-
PDF with Tensorflow on Linux
-
XOR with ToyNN (
simple_model.json
) and Tensorflow on Linux
Junk and old testing code to be removed