Skip to content

BorgwardtLab/multicenter-sepsis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the repository for the paper: Predicting sepsis using deep learning across international sites: a retrospective development and validation study

Reference:

@article{moor2023predicting,
  title={Predicting sepsis using deep learning across international sites: a retrospective development and validation study},
  author={Moor, Michael and Bennett, Nicolas and Ple{\v{c}}ko, Drago and Horn, Max and Rieck, Bastian and Meinshausen, Nicolai and B{\"u}hlmann, Peter and Borgwardt, Karsten},
  journal={eClinicalMedicine},
  volume={62},
  pages={102124},
  year={2023},
  publisher={Elsevier}
}

Disclaimer:

We plan to clean up the following components:

  • R code for data loading / harmonization
  • Python code for pre-prorcessing (feature extraction), normalization etc. (assumes a Dask pipeline that can be run on a large CPU server or cluster)

Acknowledgements:

This project was a massive effort stretching over 4 years and over 1.5K commits.

Code contributors:

Michael, Nicolas, Max, Bastian, and Drago

Data setup

In order to set up the datasets, the R package ricu (available via CRAN) is required alongside access credentials for PhysioNet and a download token for AmsterdamUMCdb. This information can then be made available to ricu by setting the environment variables RICU_PHYSIONET_USER, RICU_PHYSIONET_PASS and RICU_AUMC_TOKEN.

install.packages("ricu")
Sys.setenv(
    RICU_PHYSIONET_USER = "my-username",
    RICU_PHYSIONET_PASS = "my-password",
    RICU_AUMC_TOKEN = "my-token"
)

Then, by sourcing the files in r/utils, which will require further R packages to be installed (see r/utils/zzz-demps.R), the function export_data() becomes available. This roughly loads data corresponding to the specification in config/features.json, on an hourly grid, performs some patient filtering and concludes with some missingness imputation/feature augmentation steps. The script under r/scripts/create_dataset.R can be used to carry out these steps.

install.packages(
    c("here", "arrow", "bigmemory", "jsonlite", "data.table", "readr",
      "optparse", "assertthat", "cli", "memuse", "dplyr",
      "biglasso", "ranger", "qs", "lightgbm", "cowplot", "roll")
)

invisible(
  lapply(list.files(here::here("r", "utils"), full.names = TRUE), source)
)

for (x in c("mimic", "eicu", "hirid", "aumc")) {

  if (!is_data_avail(x)) {
    msg("setting up `{x}`\n")
    setup_src_data(x)
  }

  msg("exporting data for `{x}`\n")
  export_data(x)
}

If export_data() is called with a default argument of data_path("export") for dest_dir, this will create one parquet file per data source under data-export. This procedure can also be run using the PhysioNet demo datasets for debugging and to make sure it runs through:

install.packages(
  c("mimic.demo", "eicu.demo"),
  repos = "https://eth-mds.github.io/physionet-demo"
)

for (x in c("mimic_demo", "eicu_demo")) {
  export_data(x)
}

Python pipeline (for the machine learning / modelling side):

For transparency, we include the full list of requirements we used throughout this study in
requirements_full.txt However, some individual packages may not be supported anymore, hence to get started you may want to start with
requirements_minimal.txt

For example, by activating your virtual environment, and running:
pip install -r requirements_minimal.txt

For setting up this project, we ran:
>pipenv install
>pipenv shell Hence, feel free to also check out the Pipfile / Pipfile.lock

Datasets

Make sure that all exported data is put here:
datasets/downloads/

Source code

src:

  • torch: pytorch-based pipeline and models (currently an attention model)
    TODO: add docu for training a model
  • sklearn: sklearn-based pipeline for boosted trees baselines

Preprocessing

Running the preprocessing

source scripts/run_preprocessing.sh

Note that the preprocessed data (as parquet files) contain two different label columns: 'sep3', 'utility', whereas sep3 is the sepsis label, and utility is a regression target (that is derived from the sepsis label), as inspired by the Physionet 2019 Challenge for sepsis prediction. The utility score is a bit more complex to use, as it can not be directly used with different datasets (due to prevalence differences). We have a solution for this (lambda parameters) but they are not part of this paper. Feel free to contact us, if interested.

If you are not using our scripts (which automatically take care of this), make sure to not use either of sep3 or utility as feature for training!

Training

Model overview

  • src/torch: pytorch-based pipeline and models (currently GRU and attention model)
  • src/sklearn: sklearn-based pipeline for lightGBM and LogReg models

Running the LightGBM hyperparameter search

>source scripts/run_lgbm.sh <results_folder_name>

After having run the LightGBM hyperparameter search, run repetitions with:

>source scripts/run_lgbm_rep.sh <results_folder_name>

Running the baseline models hyperparameter search + repetitions (in one)

>source scripts/run_baselines.sh <results_folder_name>

Deep models / torch pipeline

These jobs we currently run on bs-slurm-02.

First, compile a sweep on wandb.ai, using the sweep-id, (only the id -- not the entire id-path) run:
>source scripts/wandb/submit_job.sh sweep-id
In this submit_job script you can configure the variable n_runs, i.e. how many evaluations should be run (e.g. 25 during coarse or fine tuning search, or 5 for repetition runs)

Example sweep for hyperparameter search of training an attention model on MIMIC:

method: random
metric:
  goal: minimize
  name: online_val/loss
parameters:
  batch_size:
    values:
      - 16
      - 32
      - 64
      - 128
  cost:
    value: 5
  d_model:
    values:
      - 32
      - 64
      - 128
      - 256
  dataset:
    value: MIMIC
  dropout:
    values:
      - 0.3
      - 0.4
      - 0.5
      - 0.6
      - 0.7
  gpus:
    value: -1
  ignore_statics:
    value: "True"
  label_propagation:
    value: 6
  label_propagation_right:
    value: 24
  learning_rate:
    distribution: log_uniform
    max: -7
    min: -9
  max_epochs:
    value: 100
  model:
    value: AttentionModel
  n_layers:
    value: 2
  norm:
    value: rezero
  task:
    value: classification
  weight_decay:
    values:
      - 0.1
      - 0.01
      - 0.001
      - 0.0001
program: src/torch/train_model.py

This can be directly copied into Weights & Biases, for creating a new sweep.

Training a single dataset and model

Example command for training an attention model on MIMIC:

python src/torch/train_model.py --batch_size=16 --d_model=256 --dataset=MIMIC --dropout=0.5 --gpus=-1 --ignore_statics=True --label_propagation=6 --label_propagation_right=24 --learning_rate=0.0002 --max_epochs=100 --model=AttentionModel --n_layers=2 --norm=rezero --task=classification --weight_decay=0.001  

Evaluation pipeline

Shallow models + Baselines

>source scripts/eval_sklearn.sh <results_folder_name> where the results folder refers to the output folder of the hyperparameter search Make sure that the eval_sklearn script reads all those methods you wish to evaluate. This script already assumes that repetitions are available.

Deep models

First determine the best run of your sweep, giving you a run-id. First apply this model to all datasets:
>source scripts/wandb/submit_evals.sh run-id
Once this is completed, the prediction files can be processed in the patient eval:
>source scripts/eval_torch.sh run-id

For evaluating a repetition sweep, run (on slurm)
>pipenv run python scripts/wandb/get_repetition_runs.py sweep-id1 sweep-id2 .. and once completed, run (again cpu server):
>python scripts/wandb/get_repetition_evals.py sweep-id1 sweep-id2 ...

Results and plots

For gathering all repetition results, run:
>python -m scripts.plots.gather_data --input_path results/evaluation_validation/evaluation_output_subsampled --output_path results/evaluation_validation/plots/

For creating ROC plots, run:
>python scripts/plots/plot_roc.py --input_path results/evaluation/plots/result_data.csv

For creating precision/earliness plots, run: >python -m scripts.plots.plot_scatterplots results/evaluation/plots/result_data.csv --r 0.80 --point-alpha 0.35 --line-alpha 1.0 --output results/evaluation/plots/
For the scatter data, in order to return 50 measures (5 repetition splits, 10 subsamplings), set --aggregation micro

Pooled predictions

First, we need to create a mapping from experiments (data_train,data_eval, model etc) to the prediction files:
>python scripts/map_model_to_result_files.py <path_to_predictons> --output_path <output_json_path> Use --overwrite, to overwrite an existing mapping json.

Next we actually pool the predictions:
>source scripts/pool_predictions.sh

Then, we evaluate them:
>source scripts/eval_pooled.sh
To create plots with the pooled predictions, run:
>python -m scripts.plots.gather_data --input_path results/evaluation_test/prediction_pooled_subsampled/max/evaluation_output --output_path results/evaluation_test/prediction_pooled_subsampled/max/plots/
>python scripts/plots/plot_roc.py --input_path results/evaluation_test/prediction_pooled_subsampled/max/plots/result_data_subsampled.csv
For computing precision/earliness, run:
python -m scripts.plots.plot_scatterplots results/evaluation_test/prediction_pooled_subsampled/max/plots/result_data_subsampled.csv --r 0.80 --point-alpha 0.35 --line-alpha 1.0 --output results/evaluation_test/prediction_pooled_subsampled/max/plots/ And heatmap incl. pooled preds:
>python -m scripts.make_heatmap results/evaluation_test/plots/roc_summary_subsampled.csv --pooled_path results/evaluation_test/prediction_pooled_subsampled/max/plots/roc_summary_subsampled.csv

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published