This repository contains the code for the experiments in the paper "TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems".
.
├── config_yaml/               # Configuration files for experiments
├── dataset_partitioner/       # Logic for simulating client-side data partitioning
├── Dockerfile                 # Docker container configuration
├── experiments/               # Training and attack workflow implementations
├── fl_systems/                # Federated learning systems integration
│   ├── frameworks/            # Included frameworks: FedTree (v1.0.5, latest) and NVFlare (v2.5)
│   └── utils/                 # Federated XGBoost using the Flower library (i.e., bagging, cyclic, FedXGBllr)
├── paper_visualization/       # Scripts for generating plots and figures used in the paper
├── pyproject.toml             # Project metadata and dependencies (managed via Poetry)
├── results/                   # Output directory for logs and experiment results
└── xgboost_reconstruction/    # Core implementation of the TimberStrike attackNote: The licenses for the included frameworks FedTree and NVFlare are provided in the
NOTICEfile.
To enable the use of Gurobi for the optimization problem in this work, it is recommended to provide valid Gurobi credentials. This can be done by creating a .env file in the root directory with the following content:
GUROBI_ACCESSID=<your_access_id>
GUROBI_SECRET=<your_secret_key>
GUROBI_LICENSEID=<your_license_id>Make sure you have an active Gurobi license. These credentials are required to authenticate with the Gurobi Cloud or license server.
docker build -t timberstrike .docker run --rm \
  --env-file .env \
  -v $(pwd)/results:/app/results \
  -v $(pwd)/data:/app/data \
  timberstrike ./run.shThis command mounts the local results/ directory into the container and executes the run.sh experiment script.
TimberStrike uses Poetry for dependency management.
poetry installIf you are using a newer version of Poetry, the shell plugin may need to be added manually:
poetry self add poetry-plugin-poetry-shellpoetry shellTo build the FedTree framework, refer to the instructions in the FedTree directory. It contains the original README from the upstream project.
To build NVFlare, refer to the instructions in the NVFlare XGBoost directory.
Note: Some modules may require additional dependencies. Please consult the README files inside each respective subdirectory for detailed instructions.
Use the following command to execute a complete experiment:
./run.sh <num_clients> <max_depth> <dataset_name>Where:
- <num_clients>is the number of clients in the federated learning setting.
- <max_depth>is the maximum depth of trees used in training.
- <dataset_name>is the dataset you wish to use (e.g.,- stroke,- diabetes).
Configuration files are located in the config_yaml/ directory. These YAML files define:
- Federated learning settings (e.g., number of clients, rounds, trees)
- XGBoost hyperparameters
- Dataset partitioning strategies
- Evaluation tolerance
Ensure that the appropriate configuration is set before running an experiment.
You can find example configuration files in the config_yaml/ folder to guide the creation of your own.
To run experiments with a custom dataset, follow these steps:
- 
Add your dataset: Place your dataset files inside a new folder under dataset_partitioner/<your_dataset_name>/. Preprocess the data as needed for your use case.
- 
Implement a dataloader: In dataset_partitioner/data_loader.py, add a new function that loads your dataset and returns the feature matrixXand labelsyfor both training and test splits.
- 
Register your dataset: Update the conditional logic at the beginning of data_generator.pyto call your new dataloader function when your dataset name is specified.
- 
Define a configuration file: Create a new YAML configuration file under config_yaml/, named as<your_dataset_name>_<num_clients>.yaml, to specify the desired experiment parameters.
- 
Run the experiment: Use the provided script to execute your experiment: ./run.sh <num_clients> <max_depth> <your_dataset_name> 
For any further details, please refer to relevant module-specific READMEs, if available.