Skip to content

Commit c8ec3a3

Browse files
committed
Draft of JOSS paper
1 parent 7c770dd commit c8ec3a3

File tree

3 files changed

+135
-0
lines changed

3 files changed

+135
-0
lines changed

.github/workflows/draft-pdf.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: Draft PDF
2+
on:
3+
push:
4+
paths:
5+
- paper/**
6+
- .github/workflows/draft-pdf.yml
7+
8+
jobs:
9+
paper:
10+
runs-on: ubuntu-latest
11+
name: Paper Draft
12+
steps:
13+
- name: Checkout
14+
uses: actions/checkout@v4
15+
- name: Build draft PDF
16+
uses: openjournals/openjournals-draft-action@master
17+
with:
18+
journal: joss
19+
# This should be the path to the paper within your repo.
20+
paper-path: paper/paper.md
21+
- name: Upload
22+
uses: actions/upload-artifact@v4
23+
with:
24+
name: paper
25+
# This is the output path where Pandoc will write the compiled
26+
# PDF. Note, this should be the same directory as the input
27+
# paper.md
28+
path: paper/paper.pdf

paper/paper.bib

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
@software{histomicsui,
2+
title = {HistomicsUI: Organize, visualize, annotate, and analyze histology images},
3+
author = {{Kitware, Inc}},
4+
year = {2025},
5+
note = {Package version 1.7.0},
6+
url = {https://github.com/DigitalSlideArchive/HistomicsUI},
7+
doi = {10.5281/zenodo.5474914},
8+
}
9+
10+
@software{histomicstk,
11+
title = {HistomicsTK: a Python package for the analysis of digital pathology images},
12+
author = {{Kitware, Inc}},
13+
year = {2025},
14+
note = {Package version 1.4.0},
15+
url = {https://github.com/DigitalSlideArchive/HistomicsTK},
16+
doi = {10.5281/zenodo.14833780},
17+
}
18+
19+
@software{digitalslidearchive,
20+
title = {Digital Slide Archive: a system for working with large microscopy images},
21+
author = {{Kitware, Inc}},
22+
year = {2025},
23+
note = {Commit 2da1bfc7365dd72011854b5aebf4a744cfcf98a1; Access: 2025-04-30},
24+
url = {https://github.com/DigitalSlideArchive/digital_slide_archive},
25+
}
26+
27+
@article{batchbald2019,
28+
author = {Andreas Kirsch and
29+
Joost van Amersfoort and
30+
Yarin Gal},
31+
title = {BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian
32+
Active Learning},
33+
journal = {CoRR},
34+
volume = {abs/1906.08158},
35+
year = {2019},
36+
url = {http://arxiv.org/abs/1906.08158},
37+
eprinttype = {arXiv},
38+
eprint = {1906.08158},
39+
timestamp = {Thu, 14 Oct 2021 09:14:34 +0200},
40+
biburl = {https://dblp.org/rec/journals/corr/abs-1906-08158.bib},
41+
bibsource = {dblp computer science bibliography, https://dblp.org}
42+
}

paper/paper.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
title: 'WSI Superpixel Guided Labeling'
3+
tags:
4+
- Python
5+
- histology
6+
- bioimage informatics
7+
- whole slide annotation
8+
- whole slide images
9+
- guided labeling
10+
# (add orcid for anyone who has one)
11+
authors:
12+
- name: Brianna Major
13+
affiliation: 1
14+
- name: Jeffery A. Goldstein
15+
affiliation: 2
16+
- name: Lee A. Newberg
17+
affiliation: 1
18+
orcid: 0000-0003-4644-8874
19+
- name: Abhishek Sharma
20+
affiliation: 2
21+
- name: Anders Sildnes
22+
affiliation: 2
23+
- name: Faiza Ahmed
24+
affiliation: 1
25+
- name: Mike Nagler
26+
affiliation: 1
27+
- name: Jeff Baumes
28+
affiliation: 1
29+
- name: Lee A. D. Cooper
30+
affiliation: 2
31+
- name: David Manthey
32+
affiliation: 1
33+
orcid: 0000-0002-4580-8770
34+
affiliations:
35+
- index: 1
36+
name: Kitware, Inc., New York, United States
37+
- index: 2
38+
name: Northwestern University, Illinois, United States
39+
date: 30 April 2025
40+
bibliography: paper.bib
41+
---
42+
43+
# Summary
44+
45+
`WSI Superpixel Guided Labeling` facilitates active learning on whole slide images. It has a user interface built on top of the HistomicsUI [@histomicsui] base and deployed as part of the Digital Slide Archive [@digitalslidearchive], and uses the HistomicsTK [@histomicstk] tool kit as part of the process.
46+
47+
Users label superpixel regions or other segmented areas of whole slide images to be used as classification input for machine learning algorithms. An example algorithm is included which generates superpixels, features, and machine learning models for active learning on a directory of images. The interface allows bulk labeling, labeling the most impactful superpixels to improve the model, and reviewing labeled and predicted categories.
48+
49+
# Statement of need
50+
51+
One of the limitations in generating accurate models is the need for labeled data. Given a model and a few labeled samples, there are a variety of algorithms that can be used to determine what samples should be additionally labeled to most efficiently improve the model. To actually get labeled data, this prediction of which samples to label needs to be combined with an efficient workflow so that the domain expert can use their labeling time in the most effective manner possible.
52+
53+
`WSI Superpixel Guided Labeling` provides a user interface and workflow for this guided labeling process. Given a set of whole slide images, the images are segmented based on a some user choices. This segmentation is the basis for labeling. The user can specify any number of label categories, including labels that will be excluded from training (for instance, for segmented regions whose categories cannot be accurately determined). After labeling a few initial segments, a model is generated and used to both predict the category of all segments and the segments that would result in the best improvement in the model if they were also labeled. The user can retrain the model at any time and review the results of both the predictions and other users.
54+
55+
For development, the initial segmentation uses superpixels generated with the SLIC algorithm. These are computed on whole slide images in a tiled manner so that they can work on arbitrarily large images, and the tile boundaries are properly handled to avoid visible artifacts. Either of two basic models can be trained and used for predictions: small-scale CNN using image features implemented in tensorflow/keras or torch, or a huggingface foundation model that generates a one-dimensional feature vector. The certainty criteria for which segments should be labeled next can also be selected, and includes confidence, margin, negative entropy, and the BatchBALD [@batchbald2019] algorithm.
56+
57+
We had a placental pathologist provide feedback to validate the efficiency of the user interface and utility of the process.
58+
59+
![The Guided Labeling interface showing a row of superpixels to be labeled and part of a whole slide image](../docs/screenshots/active_learning_view.png)
60+
61+
# Acknowledgements
62+
63+
This work has been funded in part by National Library of Medicine grant 5R01LM013523 entitled "Guiding humans to create better labeled datasets for machine learning in biomedical research".
64+
65+
# References

0 commit comments

Comments
 (0)