Skip to content

AstraZeneca/KidneyAI-JASN

Repository files navigation

KidneyAI

Maturity level-0 Python 3.8 Conda PyTorch 1.12.1 + CUDA 10.2

Streamlining the Histopathological Workflow in Diabetic Kidney Disease with Artificial Intelligence

JASN

This repository accompanies the manuscript “Streamlining the Histopathological Workflow in Diabetic Kidney Disease with Artificial Intelligence” and includes the source code to:

  • Glomerulus detection: Instance segmentation with Mask R-CNN to localize and extract glomeruli from PAS-stained whole-slide images.
  • Annotation pipeline: Templates and helper functions to launch, monitor, and retrieve annotation jobs on AWS, reducing study evaluation turnaround by up to 80%.
  • Automated CKD scoring: AI-based classifiers for semi-quantitative glomerular grading, achieving expert-level performance and cutting turnaround time by up to 90%.
  • Self-supervised learning: Self-supervised learning on unlabeled preclinical data to improve robustness and mitigate expert bias.
  • Translation to human biopsies: Unsupervised Feature Translation (UFT) adapts mouse-trained features to human tissue at inference, reducing the translational gap without human labels.

🚀 Installation

Build the Docker image using the Dockerfile in docker_image/Dockerfile.

cd docker_image
docker build -f Dockerfile -t kidneyaidocker \
  --build-arg UID=$(id -u) \
  --build-arg GID=$(id -g) \
  --build-arg USER=$(whoami) \
  --build-arg GROUP=$(id -g -n) .
  • Image name: kidneyaidocker
  • Context: current directory
  • Args: matches host UID/GID for smooth file permissions

🔎 Glomerular Detection

Use kidneyai_detection.py to train a detector or extract glomeruli from a study.

  • Default config: config_files/Detection.json
  • Override config: --params_path
  • Help:
    python kidneyai_detection.py --help

What to run (set in general_params -> what_to_run)

  • detection_set_generation: true
    Preprocess and build a training set per detection_set_generation_params.
    Creates: dkd_detection_set in the main data directory.
  • unnanotated_patch_generation: true
    Tile each slide per unnanotated_patch_generation_params (for unlabeled WSIs).
  • train_detection_model: true
    Train on the generated detection set.
    Configure with: model_params, dataloader_params, dataset_params, optimization_params, transfer_learning_params, log_params.
    Training recipe: train_detection_model_params.
  • detect_on_new_study: true
    Run detection on a study - configured with detect_on_new_study_params.

☁️ Glomerular Annotation with AWS Ground Truth

Prepare datasets for AWS annotation using kidneyai_detection.py.

  • Default config: config_files/Detection.json
  • Enable AWS flow: general_params -> what_to_run -> interact_with_aws: true
  • Control logic: aws_interactions

Key switches:

  • create_annotation_patches -> apply: true
    Crops and saves detected glomeruli from the detection step. Remaining fields define what/how to crop.
  • upload_data_to_s3 -> apply: true
    Uploads cropped data to S3. Remaining fields define targets and filters.

Notebooks:

  • Start annotation job: Create_Annotation_job.ipynb
  • Collect results: Collect_responses_from_AWS.ipynb

🧠 Glomerular Classification

Train classifiers and infer DKD scores using kidneyai_classification.py.

  • Default config: config_files/Classification.json
  • Override config: --params_path
  • Help:
    python kidneyai_classification.py --help

Predefined configs reproduce supervised, self-supervised, transfer learning, and unsupervised feature adaptation (UFT) as described in the paper.

✅ Supervised Training

python kidneyai_classification.py
  • --kfold: k-fold train/test as defined in k_fold_params
  • --kfold_multi: k-fold; evaluate against all annotators at test time
    • Multi-annotated slides must be marked as multi in classification -> defaults -> fold.py (L314).
    • Slides marked single are training-only in this setup.
  • To use a pretrained encoder (e.g., DINO), set transfer_learning_params accordingly.

🔁 Self-Supervised Training (SSL)

DINO:

python kidneyai_classification.py --params_path config_files/SSL-DINO-mouse.json --dino

Other SSL methods:

  • BYOL: add --byol
  • SimSiam: add --simsiam

🔄 Unsupervised Feature Translation (UFT)

Three-step workflow:

  1. Pretrain (SSL)
python kidneyai_classification.py --params_path config_files/SSL-DINO-mouse.json --dino
  1. Supervised head on frozen SSL encoder (Dual Model)
  • In config_files/DualModel.mouse.json, set model_params -> UFT -> foundation_params to load the feature extractor.
python kidneyai_classification.py --params_path config_files/DualModel.mouse.json --uft
  1. Translate encoder on target domain + evaluate
python kidneyai_classification.py --params_path config_files/UFT_translation.json --translation_kfold_multi
  • --translation_kfold_multi: update encoder and evaluate across folds (k_fold_params)
  • --translation_kfold_multi_full: translate on the entire target dataset (single adapted model)
  • Also runs evaluation on the target dataset post-translation
    Note: data_defs -> labeled_pickles should match inference pickles.

Important settings:

  • training_params -> target_uft_cls_name: which model to update
  • training_params -> model_name: output name of translated model
  • transfer_learning_params -> use_pretrained: true and pretrained_method set to the SSL method used (e.g., DINO). Pretrained weights should be those from Step 1.

Pseudo-label path: You can create pseudo labels and still use --translation_kfold_multi. Otherwise:

  • Alternative 1:
    Run DINO and set transfer_learning_params -> pretrained_method: "UFT".
    python kidneyai_classification.py --params_path config_files/SSL-DINO-mouse.json --dino
  • Alternative 2 (recommended):
    Combine --dino and --ssl_uft to load the dual model and update only the feature extractor.

After adaptation, run inference and save predictions:

python kidneyai_classification.py --params_path config_files/DualModel.translated.json --uft --inference

Ensure transfer-learning parameters for both the feature extractor and classifier are consistent in all alternatives.


🗂️ Data Folder Structure

Organize your datasets as follows:

/path_to_data/DKD/
├─ study_1/
│  ├─ unannotated/                 # WSIs without labels
│  ├─ annotated/                   # WSIs with paired .xml annotations
│  ├─ additional_localizations/    # (optional; detection-only) .xml files for unannotated WSIs
│  └─ glomeruli/
│     ├─ glomeruli-patches/        # cropped glomeruli
│     ├─ glomeruli-refs/           # reference points on WSI
│     ├─ glomerulus_patch_list.pickle
│     ├─ dkd_annotations_expert1.pickle
│     ├─ dkd_annotations_expert2.pickle
│     └─ dkd_annotations_expert3.pickle
└─ study_wa_ua/
   └─ ...

Notes:

  • glomerulus_patch_list.pickle contains a list of dictionaries describing each crop (WSI location, image path, etc.). This is the pickle used for AWS uploads.
  • Keep all DKD glomerular annotations in the study’s glomeruli/ folder.
  • In additional_localizations/, include only .xml; WSIs in that folder are not auto-read.

🧩 Configuration Reference (JSON)

  • data_defs

    • labeled_pickles: List of pickles with labeled data (can be one or more pickle filenames).
    • unlabeled_pickles: List of pickles with unlabeled data for self-supervision (can be one or more pickle filenames).
    • semilabeled_pickles: List of pickles with weakly annotated data for weak/self-supervised learning (can be one or more pickle filenames).
    • inference_pickles: List of pickles with data for DKD inference (can be one or more pickle filenames).
  • dataset_params

    • data_location: datasets root directory
    • train_transforms: training augmentations
    • val_transforms: validation augmentations
    • test_transforms: test augmentations
  • dataloader_params

    • Standard DataLoader controls (batch size, num_workers, pin_memory, etc.)
  • model_params

    • backbone_type: e.g., resnet50, deit_small
    • transformers_params:
      • img_size: input size
      • patch_size: transformer patch size
      • pretrained_type: "supervised" (ImageNet) or "dino" (ImageNet SSL)
    • pretrained: use ImageNet weights
    • freeze_backbone: freeze encoder
    • DINO: hyperparameters for DINO training
  • optimization_params

    • optimizer:
      • type: optimizer class
      • autoscale_rl: scale LR by batch size
      • params: LR, weight decay, momentum/betas, etc.
    • LARS_params: use LARS if use: true and batch size ≥ batch_act_thresh
    • scheduler:
      • type: scheduler pipeline (list)
      • params: scheduler-specific hyperparameters
  • training_params

    • model_name: the model's name
    • val_every: validation frequency (epochs, float)
    • log_every: logging frequency (iterations)
    • save_best_model: keep best model by validation metric
    • log_embeddings: plot UMAPs each validation
    • knn_eval: evaluate kNN metrics during validation
    • grad_clipping: > 0 enables clipping
    • use_mixed_precision: enable AMP
    • save_dir: path_to_checkpoints/checkpoint directory
  • system_params

    • Device usage and GPU selection (not that when more than 1 GPUs is enabled DDP will get triggered)
  • log_params

    • Project/run names for logging (default: Weights & Biases)
  • lr_finder

    • grid_search_params:
      • min_pow, max_pow: LR search range (10^min_pow to 10^max_pow)
      • resolution: number of LR candidates
      • n_epochs: maximum epochs for search
      • random_lr: sample random LRs in range
      • keep_schedule: keep LR scheduler during search
      • report_intermediate_steps: validate/log during search
  • transfer_learning_params

    • use_pretrained: enable pretrained backbone
    • pretrained_model_name: name of pretrained model
    • pretrained_path: directory with pretrained weights (typically the same as save_dir)

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages