TrustyAI's (heavily modified) fork of abacusai's xai-bench repo.
0
: A very quick benchmark to test if explainers and xai-bench is working. Runs in a few seconds.1
: A moderate benchmark to evaluate performance of TrustyAI LIME + SHAP versus official versions over a variety of models and datasets. Runs in many minutes.2
: A very thorough benchmark to evaluate performance of TrustyAI LIME + SHAP versus official versions over a huge variety of models and datasets. Runs in many hours.lime
: Config 1, but just benchmarking LIME.shap
: Config 1, but just benchmarking SHAP.
python3 main.py --config $CONFIG --label $LABEL
or
python3 main.py --c $CONFIG --l $LABEL
--config
: set the config to benchmark, one of0
,1
,2
,lime
, orshap
--label
: an optional suffix to add to saved files and produced plots, e.g., the branch name being tested
- Build the version of TrustyAI you'd like to benchmark
- Run
python3 main.py --config $CONFIG --label $LABEL
to run a benchmark config
- The first run of any benchmark will take a little longer, as ground truths of each dataset need to be generated. After that, the cached ground truths are loaded from file.
- Run results are stored in the
results/
directory; each run will produce a pickle'd Pandas dataframe of the raw benchmark data, as well as a plot within theresults/plots/
directory.