This is the repository for the paper: Predicting sepsis using deep learning across international sites: a retrospective development and validation study
@article{moor2023predicting,
title={Predicting sepsis using deep learning across international sites: a retrospective development and validation study},
author={Moor, Michael and Bennett, Nicolas and Ple{\v{c}}ko, Drago and Horn, Max and Rieck, Bastian and Meinshausen, Nicolai and B{\"u}hlmann, Peter and Borgwardt, Karsten},
journal={eClinicalMedicine},
volume={62},
pages={102124},
year={2023},
publisher={Elsevier}
}
We plan to clean up the following components:
- R code for data loading / harmonization
- Python code for pre-prorcessing (feature extraction), normalization etc. (assumes a Dask pipeline that can be run on a large CPU server or cluster)
This project was a massive effort stretching over 4 years and over 1.5K commits.
Code contributors:
Michael, Nicolas, Max, Bastian, and Drago
In order to set up the datasets, the R package ricu
(available via CRAN) is required alongside access credentials for PhysioNet and a download token for AmsterdamUMCdb. This information can then be made available to ricu
by setting the environment variables RICU_PHYSIONET_USER
, RICU_PHYSIONET_PASS
and RICU_AUMC_TOKEN
.
install.packages("ricu")
Sys.setenv(
RICU_PHYSIONET_USER = "my-username",
RICU_PHYSIONET_PASS = "my-password",
RICU_AUMC_TOKEN = "my-token"
)
Then, by sourcing the files in r/utils
, which will require further R packages to be installed (see r/utils/zzz-demps.R
), the function export_data()
becomes available. This roughly loads data corresponding to the specification in config/features.json
, on an hourly grid, performs some patient filtering and concludes with some missingness imputation/feature augmentation steps. The script under r/scripts/create_dataset.R
can be used to carry out these steps.
install.packages(
c("here", "arrow", "bigmemory", "jsonlite", "data.table", "readr",
"optparse", "assertthat", "cli", "memuse", "dplyr",
"biglasso", "ranger", "qs", "lightgbm", "cowplot", "roll")
)
invisible(
lapply(list.files(here::here("r", "utils"), full.names = TRUE), source)
)
for (x in c("mimic", "eicu", "hirid", "aumc")) {
if (!is_data_avail(x)) {
msg("setting up `{x}`\n")
setup_src_data(x)
}
msg("exporting data for `{x}`\n")
export_data(x)
}
If export_data()
is called with a default argument of data_path("export")
for dest_dir
, this will create one parquet file per data source under data-export
. This procedure can also be run using the PhysioNet demo datasets for debugging and to make sure it runs through:
install.packages(
c("mimic.demo", "eicu.demo"),
repos = "https://eth-mds.github.io/physionet-demo"
)
for (x in c("mimic_demo", "eicu_demo")) {
export_data(x)
}
For transparency, we include the full list of requirements we used throughout this study in
requirements_full.txt
However, some individual packages may not be supported anymore, hence to get started you may want to start with
requirements_minimal.txt
For example, by activating your virtual environment, and running:
pip install -r requirements_minimal.txt
For setting up this project, we ran:
>pipenv install
>pipenv shell
Hence, feel free to also check out the Pipfile / Pipfile.lock
Make sure that all exported data is put here:
datasets/downloads/
src
:
torch
: pytorch-based pipeline and models (currently an attention model)
TODO: add docu for training a modelsklearn
: sklearn-based pipeline for boosted trees baselines
source scripts/run_preprocessing.sh
Note that the preprocessed data (as parquet files) contain two different label columns: 'sep3', 'utility', whereas sep3 is the sepsis label, and utility is a regression target (that is derived from the sepsis label), as inspired by the Physionet 2019 Challenge for sepsis prediction. The utility score is a bit more complex to use, as it can not be directly used with different datasets (due to prevalence differences). We have a solution for this (lambda parameters) but they are not part of this paper. Feel free to contact us, if interested.
If you are not using our scripts (which automatically take care of this), make sure to not use either of sep3
or utility
as feature for training!
- src/torch: pytorch-based pipeline and models (currently GRU and attention model)
- src/sklearn: sklearn-based pipeline for lightGBM and LogReg models
>source scripts/run_lgbm.sh <results_folder_name>
>source scripts/run_lgbm_rep.sh <results_folder_name>
>source scripts/run_baselines.sh <results_folder_name>
These jobs we currently run on bs-slurm-02.
First, compile a sweep on wandb.ai, using the sweep-id, (only the id -- not the entire id-path) run:
>source scripts/wandb/submit_job.sh sweep-id
In this submit_job script you can configure the variable n_runs
, i.e. how many evaluations should be run (e.g. 25 during coarse or fine tuning search,
or 5 for repetition runs)
Example sweep for hyperparameter search of training an attention model on MIMIC:
method: random
metric:
goal: minimize
name: online_val/loss
parameters:
batch_size:
values:
- 16
- 32
- 64
- 128
cost:
value: 5
d_model:
values:
- 32
- 64
- 128
- 256
dataset:
value: MIMIC
dropout:
values:
- 0.3
- 0.4
- 0.5
- 0.6
- 0.7
gpus:
value: -1
ignore_statics:
value: "True"
label_propagation:
value: 6
label_propagation_right:
value: 24
learning_rate:
distribution: log_uniform
max: -7
min: -9
max_epochs:
value: 100
model:
value: AttentionModel
n_layers:
value: 2
norm:
value: rezero
task:
value: classification
weight_decay:
values:
- 0.1
- 0.01
- 0.001
- 0.0001
program: src/torch/train_model.py
This can be directly copied into Weights & Biases, for creating a new sweep.
Example command for training an attention model on MIMIC:
python src/torch/train_model.py --batch_size=16 --d_model=256 --dataset=MIMIC --dropout=0.5 --gpus=-1 --ignore_statics=True --label_propagation=6 --label_propagation_right=24 --learning_rate=0.0002 --max_epochs=100 --model=AttentionModel --n_layers=2 --norm=rezero --task=classification --weight_decay=0.001
>source scripts/eval_sklearn.sh <results_folder_name>
where the results folder refers to the output folder of the hyperparameter search
Make sure that the eval_sklearn script reads all those methods you wish to evaluate. This script already assumes that repetitions are available.
First determine the best run of your sweep, giving you a run-id.
First apply this model to all datasets:
>source scripts/wandb/submit_evals.sh run-id
Once this is completed, the prediction files can be processed in the patient eval:
>source scripts/eval_torch.sh run-id
For evaluating a repetition sweep, run (on slurm)
>pipenv run python scripts/wandb/get_repetition_runs.py sweep-id1 sweep-id2 ..
and once completed, run (again cpu server):
>python scripts/wandb/get_repetition_evals.py sweep-id1 sweep-id2 ..
.
For gathering all repetition results, run:
>python -m scripts.plots.gather_data --input_path results/evaluation_validation/evaluation_output_subsampled --output_path results/evaluation_validation/plots/
For creating ROC plots, run:
>python scripts/plots/plot_roc.py --input_path results/evaluation/plots/result_data.csv
For creating precision/earliness plots, run:
>python -m scripts.plots.plot_scatterplots results/evaluation/plots/result_data.csv --r 0.80 --point-alpha 0.35 --line-alpha 1.0 --output results/evaluation/plots/
For the scatter data, in order to return 50 measures (5 repetition splits, 10 subsamplings), set --aggregation micro
First, we need to create a mapping from experiments (data_train,data_eval, model etc) to the prediction files:
>python scripts/map_model_to_result_files.py <path_to_predictons> --output_path <output_json_path>
Use --overwrite, to overwrite an existing mapping json.
Next we actually pool the predictions:
>source scripts/pool_predictions.sh
Then, we evaluate them:
>source scripts/eval_pooled.sh
To create plots with the pooled predictions, run:
>python -m scripts.plots.gather_data --input_path results/evaluation_test/prediction_pooled_subsampled/max/evaluation_output --output_path results/evaluation_test/prediction_pooled_subsampled/max/plots/
>python scripts/plots/plot_roc.py --input_path results/evaluation_test/prediction_pooled_subsampled/max/plots/result_data_subsampled.csv
For computing precision/earliness, run:
python -m scripts.plots.plot_scatterplots results/evaluation_test/prediction_pooled_subsampled/max/plots/result_data_subsampled.csv --r 0.80 --point-alpha 0.35 --line-alpha 1.0 --output results/evaluation_test/prediction_pooled_subsampled/max/plots/
And heatmap incl. pooled preds:
>python -m scripts.make_heatmap results/evaluation_test/plots/roc_summary_subsampled.csv --pooled_path results/evaluation_test/prediction_pooled_subsampled/max/plots/roc_summary_subsampled.csv