The following packages were cloned and modified for the simulations
- shap, cloned 14/10/2024, SHA:
215c58525127761208dd2c9754bd468a0295f537onmaster - glex, cloned 19/11/2024, SHA:
bb2be47ca2eb2a657095ecfd1b51877dfe0debe9onmaster
The folder codeforsimulations is a git repository, version history can be viewed with git log
The seed used in the Figures for the main article is not the seed used in the simulations here, so therefore there might be slight variations in Figures 1 and 4 due to seed randomness.
To reproduce Figures 1, 2 and 4, follow these steps:
- Change working directory to
simulation_glex - Ensure
R (4.4.1)is installed with:scico, devtools, data.table, tidyverse, reshape2, patchwork, future, future.apply, doFuture, xgboost, mvtnorm, mlr3verse - Run
run.rto generatecomplete_res.RData(takes a long time, suggested to run this on a cluster) - Run
plot_fig.rin the foldersfigure1, figure2, figure4to generate the plots. The plot will use the data fromcomplete_res.RData
To reproduce Figure 3, follow these steps:
- Change working directory to
simulation_runtime_shap_vs_glex - Ensure
R (4.4.1)is installed with:scico, devtools, xgboost, data.table, xgboost, tidyverse, reshape2, bench - Run
generate_dat_and_model.rto generate dataset of 10000 observations and XGBoost model with 20 trees - Run
benchmark_fastpd.rto get the FastPD benchmarks (takes a long time, suggested to run this on a cluster) - Ensure
Python >=3.12is installed - Make a Python virutal environment using
python3.12 -m venv venv - Activate the venv
source venv/bin/activate - Install the modified SHAP package
pip install ../shap - Install XGBoost if not exists
pip install xgboost - Run
python benchmark_shap.pyto get the SHAP benchmarks (takes a long time, suggested to run this on a cluster) - Change working directory to
figure3 - Run
plot_runtime.rto get the benchmark plots
- Removed `docs, data, javascript, notebooks, scripts
- Modified
shap/explainers/_tree.py
- Added
src/recurse_fastpd.cpp - Added
R/fastpd.R - Added auxillary functions in
glex.Rto call the FastPD implementation