Skip to content

4 random sampling #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
May 21, 2025
Merged

4 random sampling #13

merged 22 commits into from
May 21, 2025

Conversation

J-Dymond
Copy link
Collaborator

@J-Dymond J-Dymond commented May 16, 2025

  • Random sampling + analysis scripts

  • Refactoring changes, moving dataset loading functions into separate files

  • Additional configs

@J-Dymond J-Dymond linked an issue May 16, 2025 that may be closed by this pull request
@jack89roberts
Copy link
Collaborator

There's also a lot going on in the __main__ block of random sampling that could be pulled out into functions (anything that isn't just argparse)

@J-Dymond J-Dymond linked an issue May 16, 2025 that may be closed by this pull request
@J-Dymond
Copy link
Collaborator Author

J-Dymond commented May 16, 2025

Will refactor __main__ in scripts/random_sampling.py

@J-Dymond J-Dymond marked this pull request as ready for review May 16, 2025 14:35
@J-Dymond
Copy link
Collaborator Author

removed conflicts with main

@J-Dymond J-Dymond requested a review from klh5 May 16, 2025 15:14
@klh5
Copy link
Collaborator

klh5 commented May 19, 2025

I missed this in the previous PR, but in dataset_generation.py args.r is always parsed as a string by argparse, so passing a float results in an error because you're multiplying by a string:

if args.r is not None:
        imbalance_ratio = args.r
        if isinstance(imbalance_ratio, int): # imbalance_ratio is never an int
            n_targets = imbalance_ratio
        else:
            n_targets = int(len(non_targets[0]) * imbalance_ratio) # Multiply by string in this case
            ...

Easy fix is probably to always cast args.r to float, then check if it's >1.

if args.r is not None:
        imbalance_ratio = float(args.r)
        if imbalance_ratio > 1:
            n_targets = imbalance_ratio
        else:
            n_targets = int(len(non_targets[0]) * imbalance_ratio)
            ...

@J-Dymond
Copy link
Collaborator Author

I've made that change now, thank you!

Copy link
Collaborator

@klh5 klh5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - just a couple of small things.

@J-Dymond J-Dymond merged commit fea32ef into main May 21, 2025
5 checks passed
@J-Dymond J-Dymond deleted the 4-random-sampling branch May 21, 2025 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmark performance using random sampling Assess performance on balanced class problem
3 participants