Skip to content

Splitter #719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Jun 13, 2025
Merged

Splitter #719

merged 39 commits into from
Jun 13, 2025

Conversation

thibaultdvx
Copy link
Collaborator

@thibaultdvx thibaultdvx commented May 19, 2025

The main goal of this PR is to make the splitter module ready to use. To do that, I did:

  • adapt the splitters to the new ConcatDataset, PairedDataset, and UnpairedDataset;
  • enhance docstrings;
  • create pages on the splitters in the documentation;
  • update test data, to be consistent with the other tests;
  • complete unittests.

I also did:

  • change a bit the subset function of CapsDataset (and its derivatives), so that it doesn't raise an error if a (participant, session) couple in the input DataFrame is not in the datasets;
  • remove unused functions from split_utils.py, and put this file in make_splits because it only concerns make_split and make_kfold;
  • rewrite some tsv tools like read_data (function to read a tsv) in tsvtools/utils.py, and remove unused functions in this python file;
  • remove default values from SingleSplitConfig and KFoldConfig to only keep them in make_split and make_kfold;
  • put the checks made on the split directories directly in SingleSplitConfig and KFoldConfig;
  • reduce docstrings of private functions in order to have smaller files;
  • reorganize files to have always the main function at the top.

Besides, I would particularly want to have your opinion on two changes I made:

  • for SingleSplit, I suggested to change get_splits to get_split, because I found it confusing to have plural here;
  • I also changed validate_longitudinal to longitudinal.

@camillebrianceau camillebrianceau added the refactoring ClinicaDL refactoring 2024 label May 23, 2025
@thibaultdvx thibaultdvx marked this pull request as ready for review June 11, 2025 08:34
@thibaultdvx
Copy link
Collaborator Author

To fix multiprocessing issues with num_workers for Python 3.12 on MacOS, I made pytest skip the multiprocessing tests on MacOS machines.

Copy link
Collaborator

@camillebrianceau camillebrianceau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's merge it so I can rebase before Monday!

And you can change get_splits to get_split if you prefer.

@thibaultdvx thibaultdvx merged commit 94b4d33 into aramis-lab:clinicadl_v2 Jun 13, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring ClinicaDL refactoring 2024
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants