Skip to content

Conversation

@K-Meech
Copy link
Contributor

@K-Meech K-Meech commented Nov 12, 2025

Description

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Why is this PR needed?

Currently, atlas example scripts repeat much of the same processing. This PR aims to abstract the main processing steps into a re-useable function: raw_to_ready

What does this PR do?

Bundles pre-processing steps into a re-useable function - see brainglobe/atlas-forge#10 (comment) for an overview of the various steps agreed for the two main processing steps: source_to_raw and raw_to_ready.

Some steps are still marked as todo in the code:

Feedback on any part of this is very welcome! Particularly:

References

brainglobe/atlas-forge#10 and brainglobe/atlas-forge#14

How has this PR been tested?

I tested this locally with two example nifiti files arranged in two subject dirs:

derivatives
│   source_data.csv
│
└───sub-ZF65f
│             sub-ZF65f_channel-green_res-50x50x50mm_origin-asr.nii.gz 
└───sub-ZF7927f
               sub-ZF7927f_channel-green_res-50x50x50mm_origin-asr.nii.gz

source_data.csv was in the standard input csv format

The json config file was:

{
    "derivatives_dir": "path/to/derivatives",
    "resolution_z": 50,
    "resolution_y": 50,
    "resolution_x": 50,
    "mask": {
        "gaussian_sigma": 3,
        "threshold_method": "triangle",
        "closing_size": 5,
        "erode_size": 0
    },
    "pad_pixels": 5
}

Is this a breaking change?

No

Does this PR require an update to the documentation?

TODO - once brought out of draft

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality (unit & integration)
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@K-Meech K-Meech requested a review from stellaprins November 12, 2025 15:55
Copy link
Contributor

@stellaprins stellaprins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I merged Alessandro's PR that adds a column with custom mask paths to the input csv file, so that should probably be taken into account here.

Thanks for adding the TODOs, they're helpful! If denoising isn’t going to be part of this PR, we should create a separate issue for it.

image = correct_image_brightness(image, spacing=vox_sizes_mm)

mask_config = config.mask
mask = create_mask(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the step to be skipped in case a custo mask is provided.

Copy link
Member

@niksirbi niksirbi Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we have to determine exactly how...
Relevant discussions in brainglobe/atlas-forge#1

@niksirbi
Copy link
Member

This looks nice @K-Meech, I did a first-pass on this draft and left some initial comments that came to mind.

@K-Meech
Copy link
Contributor Author

K-Meech commented Nov 19, 2025

Thanks for the comments all! Here's a rough list of the main changes to be made before this comes out of draft:

  • Change config file to use yaml rather than json
  • re-name xflip to something like lrflip or fliplr
  • Add validation of input csv (now Add input csv validation #96 has been merged)
  • Decide whether the input csv should remain as the standard source image csv (as it is now), or use the output csv from source_to_raw (with paths to processed images / processed masks etc). I added this in my draft of source_to_raw, but wasn't sure if we wanted to keep everything only dependent on the original csv file.
  • Remove de-noising todo - this has been split into a separate issue to tackle later: [Feature] Add a de-noising step to raw_to_ready atlas-forge#17
  • Skip the mask creation step, if a mask is provided in the input csv (for now, we only handle the case where masks are present at the start, alongside the source data. Updated masks from processed images will be handled in a separate issue: Handling masks created from processed nifti images atlas-forge#16)
  • On discussion with @stellaprins / @alessandrofelder - we will keep the separate raw and derivatives directories for now. We should check this PR assumes the raw images are being read from a separate location to where it writes its outputs.
  • Add tests for functionality

@alessandrofelder
Copy link
Member

but wasn't sure if we wanted to keep everything only dependent on the original csv file.

My vote is pro a new csv file. I think it's better for tracing back what happened to the data, and it's a natural place to write something to file because we are also writing the "raw" images and the QC plots to file at the same stage.

@stellaprins
Copy link
Contributor

I've added tests covering most scenarios I could think off.

The only thing I just realised that is not covered is testing a scenario in which the automatic mask generation is overruled. The make_stack helper function creates simple mask stacks in the test_stacks fixture to create such test.

I will mark this PR ready for review and create an issue for adding this "mask test" later.

@stellaprins stellaprins marked this pull request as ready for review November 27, 2025 17:16
@K-Meech K-Meech mentioned this pull request Nov 28, 2025
7 tasks
Copy link
Contributor Author

@K-Meech K-Meech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests @stellaprins ! I've added some general comments below - but Alessandro + others should also take a look (as I was involved in writing this 😅 )

One important point is that there are some output files currently not covered by tests e.g. the QC plots, and the two .txt files with brain/mask paths.

masks ready for template creation.

This assumes source_to_raw has already been run to downsample images,
re-orient them to ASR and save them to the derivatives directory.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
re-orient them to ASR and save them to the derivatives directory.
re-orient them to ASR and save them to the raw directory.

@@ -0,0 +1,253 @@
from pathlib import Path
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions for other tests:

  • It would be good to test that an input csv file with a use column correctly excludes subjects with use=False
  • As you already mentioned, it would be good to test that provided masks are re-used correctly (rather than creating a new one)
  • Maybe something around checking that changing the config does change the result? E.g. could try two configs with different values for pad_pixels + verify the output image/mask size is as expected

"""Test that raw_to_ready creates expected derivatives directory."""
csv_path, config_path = create_raw_test_data
raw_to_ready(csv_path, config_path)
assert (create_raw_test_data[0].parents[0] / "derivatives").exists()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to test for the presence of more files? E.g.:

  • the all_processed_brain_paths.txt + check it has the correct paths inside
  • the all_processed_mask_paths.txt + check it has the correct paths inside
  • QC plots for each subject
  • image / mask files for each subject

I know the image / mask files are referenced in other tests, but I don't think the QC plots and .txt files are touched on anywhere yet?

assert (create_raw_test_data[0].parents[0] / "derivatives").exists()


def test_create_subject_dir(tmp_path: Path) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably personal preference, but I'd favour testing more of this at the top level by calling raw_to_ready directly? raw_to_ready is the function users will actually be calling + the one we want to be sure of correct behaviour - while underscore functions like _create_subject_dir are more implementation details. Testing more at the top level, leaves us free to refactor how it works underneath freely (i.e. renaming underscore functions / creating more / splitting them up...) without having to update the tests in any way.

Not sure what @alessandrofelder and others think? (as a lot of this comes down to personal preference!)

tmp_path: Path,
test_stacks: dict[str, NDArray[np.float64]],
) -> None:
"""Test lr-flipped asym images differ from non-flipped ones."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you need to do this with both the symmetric and asymmetric input images?

The important point is to test that they are lr-flipped which is only possible to test with the asymmetric input. i.e. I'm not sure the symmetric input is adding anything here - it just proves that symmetric images are left-right symmetric, rather than something about the behaviour of the function.

)

image = load_any(image_path)
flipped_image = load_any(flipped_path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as testing that the flipped / non-flipped images aren't identical. You could also do something like:

np.testing.assert_equal(flipped_image, np.flip(image, axis=2))

to check that they are really flipped copies of the same image.


def create_test_csv(path: Path, test_data: list[dict[str, Any]]) -> Path:
"""Creates "raw_data" CSV file and returns it's path."""
for data in test_data:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to remove image from test_data before you make the csv file. At the moment, the numpy array is being included in the csv file.

Comment on lines +50 to +53
"image": make_stack(),
"mask": make_stack(mask=True),
"image_asym": make_stack(offset=5),
"mask_asym": make_stack(mask=True, offset=5),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you need the symmetric image / mask here? Maybe just always use the asymmetric one so any flips / re-orientations etc can be tested?

for data in test_data:
image_path = path / data["subject_id"] / f"{data['subject_id']}.nii.gz"
image_path.parent.mkdir()
save_as_asr_nii(data["image"], data["voxel_size"], image_path)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how precise we want to be here, but this function needs the voxel sizes in mm in xyz order (rather than in microns, as it is in the csv files). So these values * 0.001 - it's only really relevant if we add any tests checking output voxel sizes and similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants