KidneyAI

Streamlining the Histopathological Workflow in Diabetic Kidney Disease with Artificial Intelligence

JASN

This repository accompanies the manuscript “Streamlining the Histopathological Workflow in Diabetic Kidney Disease with Artificial Intelligence” and includes the source code to:

Glomerulus detection: Instance segmentation with Mask R-CNN to localize and extract glomeruli from PAS-stained whole-slide images.
Annotation pipeline: Templates and helper functions to launch, monitor, and retrieve annotation jobs on AWS, reducing study evaluation turnaround by up to 80%.
Automated CKD scoring: AI-based classifiers for semi-quantitative glomerular grading, achieving expert-level performance and cutting turnaround time by up to 90%.
Self-supervised learning: Self-supervised learning on unlabeled preclinical data to improve robustness and mitigate expert bias.
Translation to human biopsies: Unsupervised Feature Translation (UFT) adapts mouse-trained features to human tissue at inference, reducing the translational gap without human labels.

🚀 Installation

Build the Docker image using the Dockerfile in docker_image/Dockerfile.

cd docker_image
docker build -f Dockerfile -t kidneyaidocker \
  --build-arg UID=$(id -u) \
  --build-arg GID=$(id -g) \
  --build-arg USER=$(whoami) \
  --build-arg GROUP=$(id -g -n) .

Image name: kidneyaidocker
Context: current directory
Args: matches host UID/GID for smooth file permissions

🔎 Glomerular Detection

Use kidneyai_detection.py to train a detector or extract glomeruli from a study.

Default config: config_files/Detection.json
Override config: --params_path
Help:
```
python kidneyai_detection.py --help
```

What to run (set in `general_params -> what_to_run`)

detection_set_generation: true
Preprocess and build a training set per detection_set_generation_params.
Creates: dkd_detection_set in the main data directory.
unnanotated_patch_generation: true
Tile each slide per unnanotated_patch_generation_params (for unlabeled WSIs).
train_detection_model: true
Train on the generated detection set.
Configure with: model_params, dataloader_params, dataset_params, optimization_params, transfer_learning_params, log_params.
Training recipe: train_detection_model_params.
detect_on_new_study: true
Run detection on a study - configured with detect_on_new_study_params.

☁️ Glomerular Annotation with AWS Ground Truth

Prepare datasets for AWS annotation using kidneyai_detection.py.

Default config: config_files/Detection.json
Enable AWS flow: general_params -> what_to_run -> interact_with_aws: true
Control logic: aws_interactions

Key switches:

create_annotation_patches -> apply: true
Crops and saves detected glomeruli from the detection step. Remaining fields define what/how to crop.
upload_data_to_s3 -> apply: true
Uploads cropped data to S3. Remaining fields define targets and filters.

Notebooks:

Start annotation job: Create_Annotation_job.ipynb
Collect results: Collect_responses_from_AWS.ipynb

🧠 Glomerular Classification

Train classifiers and infer DKD scores using kidneyai_classification.py.

Default config: config_files/Classification.json
Override config: --params_path

Help:

python kidneyai_classification.py --help

Predefined configs reproduce supervised, self-supervised, transfer learning, and unsupervised feature adaptation (UFT) as described in the paper.

✅ Supervised Training

python kidneyai_classification.py

--kfold: k-fold train/test as defined in k_fold_params
--kfold_multi: k-fold; evaluate against all annotators at test time
- Multi-annotated slides must be marked as multi in classification -> defaults -> fold.py (L314).
- Slides marked single are training-only in this setup.
To use a pretrained encoder (e.g., DINO), set transfer_learning_params accordingly.

🔁 Self-Supervised Training (SSL)

DINO:

python kidneyai_classification.py --params_path config_files/SSL-DINO-mouse.json --dino

Other SSL methods:

BYOL: add --byol
SimSiam: add --simsiam

🔄 Unsupervised Feature Translation (UFT)

Three-step workflow:

Pretrain (SSL)

python kidneyai_classification.py --params_path config_files/SSL-DINO-mouse.json --dino

Supervised head on frozen SSL encoder (Dual Model)

In config_files/DualModel.mouse.json, set model_params -> UFT -> foundation_params to load the feature extractor.

python kidneyai_classification.py --params_path config_files/DualModel.mouse.json --uft

Translate encoder on target domain + evaluate

python kidneyai_classification.py --params_path config_files/UFT_translation.json --translation_kfold_multi

--translation_kfold_multi: update encoder and evaluate across folds (k_fold_params)
--translation_kfold_multi_full: translate on the entire target dataset (single adapted model)
Also runs evaluation on the target dataset post-translation
Note: data_defs -> labeled_pickles should match inference pickles.

Important settings:

training_params -> target_uft_cls_name: which model to update
training_params -> model_name: output name of translated model
transfer_learning_params -> use_pretrained: true and pretrained_method set to the SSL method used (e.g., DINO). Pretrained weights should be those from Step 1.

Pseudo-label path: You can create pseudo labels and still use --translation_kfold_multi. Otherwise:

Alternative 1:
Run DINO and set transfer_learning_params -> pretrained_method: "UFT".

python kidneyai_classification.py --params_path config_files/SSL-DINO-mouse.json --dino

Alternative 2 (recommended):
Combine --dino and --ssl_uft to load the dual model and update only the feature extractor.

After adaptation, run inference and save predictions:

python kidneyai_classification.py --params_path config_files/DualModel.translated.json --uft --inference

Ensure transfer-learning parameters for both the feature extractor and classifier are consistent in all alternatives.

🗂️ Data Folder Structure

Organize your datasets as follows:

/path_to_data/DKD/
├─ study_1/
│  ├─ unannotated/                 # WSIs without labels
│  ├─ annotated/                   # WSIs with paired .xml annotations
│  ├─ additional_localizations/    # (optional; detection-only) .xml files for unannotated WSIs
│  └─ glomeruli/
│     ├─ glomeruli-patches/        # cropped glomeruli
│     ├─ glomeruli-refs/           # reference points on WSI
│     ├─ glomerulus_patch_list.pickle
│     ├─ dkd_annotations_expert1.pickle
│     ├─ dkd_annotations_expert2.pickle
│     └─ dkd_annotations_expert3.pickle
└─ study_wa_ua/
   └─ ...

Notes:

glomerulus_patch_list.pickle contains a list of dictionaries describing each crop (WSI location, image path, etc.). This is the pickle used for AWS uploads.
Keep all DKD glomerular annotations in the study’s glomeruli/ folder.
In additional_localizations/, include only .xml; WSIs in that folder are not auto-read.

🧩 Configuration Reference (JSON)

data_defs
- labeled_pickles: List of pickles with labeled data (can be one or more pickle filenames).
- unlabeled_pickles: List of pickles with unlabeled data for self-supervision (can be one or more pickle filenames).
- semilabeled_pickles: List of pickles with weakly annotated data for weak/self-supervised learning (can be one or more pickle filenames).
- inference_pickles: List of pickles with data for DKD inference (can be one or more pickle filenames).
dataset_params
- data_location: datasets root directory
- train_transforms: training augmentations
- val_transforms: validation augmentations
- test_transforms: test augmentations
dataloader_params
- Standard DataLoader controls (batch size, num_workers, pin_memory, etc.)
model_params
- backbone_type: e.g., resnet50, deit_small
- transformers_params:
  - img_size: input size
  - patch_size: transformer patch size
  - pretrained_type: "supervised" (ImageNet) or "dino" (ImageNet SSL)
- pretrained: use ImageNet weights
- freeze_backbone: freeze encoder
- DINO: hyperparameters for DINO training
optimization_params
- optimizer:
  - type: optimizer class
  - autoscale_rl: scale LR by batch size
  - params: LR, weight decay, momentum/betas, etc.
- LARS_params: use LARS if use: true and batch size ≥ batch_act_thresh
- scheduler:
  - type: scheduler pipeline (list)
  - params: scheduler-specific hyperparameters
training_params
- model_name: the model's name
- val_every: validation frequency (epochs, float)
- log_every: logging frequency (iterations)
- save_best_model: keep best model by validation metric
- log_embeddings: plot UMAPs each validation
- knn_eval: evaluate kNN metrics during validation
- grad_clipping: > 0 enables clipping
- use_mixed_precision: enable AMP
- save_dir: path_to_checkpoints/checkpoint directory
system_params
- Device usage and GPU selection (not that when more than 1 GPUs is enabled DDP will get triggered)
log_params
- Project/run names for logging (default: Weights & Biases)
lr_finder
- grid_search_params:
  - min_pow, max_pow: LR search range (10^min_pow to 10^max_pow)
  - resolution: number of LR candidates
  - n_epochs: maximum epochs for search
  - random_lr: sample random LRs in range
  - keep_schedule: keep LR scheduler during search
  - report_intermediate_steps: validate/log during search
transfer_learning_params
- use_pretrained: enable pretrained backbone
- pretrained_model_name: name of pretrained model
- pretrained_path: directory with pretrained weights (typically the same as save_dir)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
aws		aws
classification		classification
config_files		config_files
detection		detection
docker_image		docker_image
notebooks		notebooks
AUTHORS.md		AUTHORS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
kidneyai_classification.py		kidneyai_classification.py
kidneyai_detection.py		kidneyai_detection.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KidneyAI

Streamlining the Histopathological Workflow in Diabetic Kidney Disease with Artificial Intelligence

🚀 Installation

🔎 Glomerular Detection

What to run (set in `general_params -> what_to_run`)

☁️ Glomerular Annotation with AWS Ground Truth

🧠 Glomerular Classification

✅ Supervised Training

🔁 Self-Supervised Training (SSL)

🔄 Unsupervised Feature Translation (UFT)

🗂️ Data Folder Structure

🧩 Configuration Reference (JSON)

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

AstraZeneca/KidneyAI-JASN

Folders and files

Latest commit

History

Repository files navigation

KidneyAI

Streamlining the Histopathological Workflow in Diabetic Kidney Disease with Artificial Intelligence

🚀 Installation

🔎 Glomerular Detection

What to run (set in general_params -> what_to_run)

☁️ Glomerular Annotation with AWS Ground Truth

🧠 Glomerular Classification

✅ Supervised Training

🔁 Self-Supervised Training (SSL)

🔄 Unsupervised Feature Translation (UFT)

🗂️ Data Folder Structure

🧩 Configuration Reference (JSON)

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

What to run (set in `general_params -> what_to_run`)

Packages