Draft of raw_to_ready function #97

K-Meech · 2025-11-12T15:51:28Z

Description

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?

Currently, atlas example scripts repeat much of the same processing. This PR aims to abstract the main processing steps into a re-useable function: raw_to_ready

What does this PR do?

Bundles pre-processing steps into a re-useable function - see brainglobe/atlas-forge#10 (comment) for an overview of the various steps agreed for the two main processing steps: source_to_raw and raw_to_ready.

Some steps are still marked as todo in the code:

there should be a de-noising step - but I wasn't sure which filters we wanted to support for this?
~~n4 bias field correction is waiting on Add alternative to ants.n4_bias_field_correction #89~~
validation of the input csv is suggested in a separate issue / will be in a separate PR: [Feature] validation utility for standard spreadsheet #92

Feedback on any part of this is very welcome! Particularly:

are there particular steps you'd like to make optional?
currently I assume all images have been downsampled to the same voxel size x/y/z - are there any situations where you may have a different resolution for individual subject ids?
I put some general suggestions on [Feature] Explore the extent to which we can make raw_to_ready agnostic of specific template atlas-forge#14 about the steps / potential output from source_to_raw
We should allow skipping the mask creation step if mask_filepath is provided in the input csv. There's an open issue about this though: Handling masks created from processed nifti images atlas-forge#16

References

brainglobe/atlas-forge#10 and brainglobe/atlas-forge#14

How has this PR been tested?

I tested this locally with two example nifiti files arranged in two subject dirs:

derivatives
│   source_data.csv
│
└───sub-ZF65f
│             sub-ZF65f_channel-green_res-50x50x50mm_origin-asr.nii.gz 
└───sub-ZF7927f
               sub-ZF7927f_channel-green_res-50x50x50mm_origin-asr.nii.gz

source_data.csv was in the standard input csv format

The json config file was:

{
    "derivatives_dir": "path/to/derivatives",
    "resolution_z": 50,
    "resolution_y": 50,
    "resolution_x": 50,
    "mask": {
        "gaussian_sigma": 3,
        "threshold_method": "triangle",
        "closing_size": 5,
        "erode_size": 0
    },
    "pad_pixels": 5
}

Is this a breaking change?

No

Does this PR require an update to the documentation?

TODO - once brought out of draft

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality (unit & integration)
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

stellaprins

Nice work! I merged Alessandro's PR that adds a column with custom mask paths to the input csv file, so that should probably be taken into account here.

Thanks for adding the TODOs, they're helpful! If denoising isn’t going to be part of this PR, we should create a separate issue for it.

brainglobe_template_builder/preproc/raw_to_ready.py

stellaprins · 2025-11-14T16:35:48Z

brainglobe_template_builder/preproc/raw_to_ready.py

+    image = correct_image_brightness(image, spacing=vox_sizes_mm)
+
+    mask_config = config.mask
+    mask = create_mask(


I think this is the step to be skipped in case a custo mask is provided.

Yes, but we have to determine exactly how...
Relevant discussions in brainglobe/atlas-forge#1

brainglobe_template_builder/preproc/preproc_config.py

brainglobe_template_builder/preproc/raw_to_ready.py

niksirbi · 2025-11-17T15:19:13Z

This looks nice @K-Meech, I did a first-pass on this draft and left some initial comments that came to mind.

K-Meech · 2025-11-19T15:18:52Z

Thanks for the comments all! Here's a rough list of the main changes to be made before this comes out of draft:

Change config file to use yaml rather than json
re-name xflip to something like lrflip or fliplr
Add validation of input csv (now Add input csv validation #96 has been merged)
Decide whether the input csv should remain as the standard source image csv (as it is now), or use the output csv from source_to_raw (with paths to processed images / processed masks etc). I added this in my draft of source_to_raw, but wasn't sure if we wanted to keep everything only dependent on the original csv file.
Remove de-noising todo - this has been split into a separate issue to tackle later: [Feature] Add a de-noising step to raw_to_ready atlas-forge#17
Skip the mask creation step, if a mask is provided in the input csv (for now, we only handle the case where masks are present at the start, alongside the source data. Updated masks from processed images will be handled in a separate issue: Handling masks created from processed nifti images atlas-forge#16)
On discussion with @stellaprins / @alessandrofelder - we will keep the separate raw and derivatives directories for now. We should check this PR assumes the raw images are being read from a separate location to where it writes its outputs.
Add tests for functionality

alessandrofelder · 2025-11-19T16:54:43Z

but wasn't sure if we wanted to keep everything only dependent on the original csv file.

My vote is pro a new csv file. I think it's better for tracing back what happened to the data, and it's a natural place to write something to file because we are also writing the "raw" images and the QC plots to file at the same stage.

…der into km/raw_to_ready

…globe-template-builder into km/raw_to_ready

stellaprins · 2025-11-27T17:16:12Z

I've added tests covering most scenarios I could think off.

The only thing I just realised that is not covered is testing a scenario in which the automatic mask generation is overruled. The make_stack helper function creates simple mask stacks in the test_stacks fixture to create such test.

I will mark this PR ready for review and create an issue for adding this "mask test" later.

K-Meech

Thanks for adding the tests @stellaprins ! I've added some general comments below - but Alessandro + others should also take a look (as I was involved in writing this 😅 )

One important point is that there are some output files currently not covered by tests e.g. the QC plots, and the two .txt files with brain/mask paths.

K-Meech · 2025-11-28T13:21:07Z

brainglobe_template_builder/preproc/raw_to_ready.py

+    masks ready for template creation.
+
+    This assumes source_to_raw has already been run to downsample images,
+    re-orient them to ASR and save them to the derivatives directory.


Suggested change

re-orient them to ASR and save them to the derivatives directory.

re-orient them to ASR and save them to the raw directory.

K-Meech · 2025-11-28T13:27:43Z