Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue/17 #36

Merged
merged 4 commits into from
Oct 7, 2024
Merged

Issue/17 #36

merged 4 commits into from
Oct 7, 2024

Conversation

maxwest-uw
Copy link
Collaborator

@maxwest-uw maxwest-uw commented Oct 4, 2024

Change Description

creates the LoopConfiguration class, which collates all the various configuration options for the learn_loop function and adds both some pre-run validation for these options as well as the ability to write all the configuration options to a JSON file, which can be read and rebuilt at runtime.

I tried to make this change in a minimally invasive way to the preexisting API. For instance, if before learn_loop was called like:

learn_loop(
      nloops=1,
      features_method="bazin",
      strategy="RandomSampling",
      path_to_features=output_file,
      output_metrics_file=os.path.join(dir_name,"just_a_name.csv"),
      output_queried_file=os.path.join(dir_name,"just_other_name.csv"),
)

the new API could be

learn_loop(
    LoopConfiguration(
        nloops=1,
        features_method="bazin",
        strategy="RandomSampling",
        path_to_features=output_file,
        output_metrics_file=os.path.join(dir_name,"just_a_name.csv"),
        output_queried_file=os.path.join(dir_name,"just_other_name.csv"),
    )
)

of course, you can also just instantiate it separately.

lc = LoopConfiguration(
  nloops=1,
  features_method="bazin",
  strategy="RandomSampling",
  path_to_features=output_file,
  output_metrics_file=os.path.join(dir_name,"just_a_name.csv"),
  output_queried_file=os.path.join(dir_name,"just_other_name.csv"),
)
learn_loop(lc)

or write and read it from a json file.

lc1 = LoopConfiguration(
  nloops=1,
  features_method="bazin",
  strategy="RandomSampling",
  path_to_features=output_file,
  output_metrics_file=os.path.join(dir_name,"just_a_name.csv"),
  output_queried_file=os.path.join(dir_name,"just_other_name.csv"),
)
lc1.to_json("./config_cache.json")
...
# some lines later...
...
lc2 = LoopConfiguration.from_json("./config_cache.json")
learn_loop(lc2)

resolves #17

  • My PR includes a link to the issue that I am addressing

Solution Description

Created the LoopConfiguration class and placed all of the config parameters in there. Added some validation steps.

For the learn_loop.py module, I also passed the LoopConfiguration class through some of the ancillary functions, my arbitrary marker was that if it originally took in more than three of the config parameters I would just replace those with the full config instance and call the field directly in the function .

While I was going through the refactor, I also took some time to write a few more tests for learn_loop and changed the styling so that multi-line function/object calls have one parameter per line.

Code Quality

  • I have read the Contribution Guide
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

New Feature Checklist

  • I have added or updated the docstrings associated with my feature using the NumPy docstring format
  • I have updated the tutorial to highlight my new feature (if appropriate)
  • I have added unit/End-to-End (E2E) test cases to cover my new feature
  • My change includes a breaking change
    • My change includes backwards compatibility and deprecation warnings (if possible)

Documentation Change Checklist

Copy link

github-actions bot commented Oct 4, 2024

Before [0c8d74f] <v0.1> After [7738bfa] Ratio Benchmark (Parameter)
failed 137±1ms n/a benchmarks.time_feature_creation
failed 168±1ms n/a benchmarks.time_learn_loop('KNN', 'RandomSampling')
failed 168±3ms n/a benchmarks.time_learn_loop('KNN', 'UncSampling')
failed 2.68±0.01s n/a benchmarks.time_learn_loop('RandomForest', 'RandomSampling')
failed 2.67±0.01s n/a benchmarks.time_learn_loop('RandomForest', 'UncSampling')

Click here to view all benchmarks.

@@ -1622,16 +1622,17 @@ def save_metrics(self, loop: int, output_metrics_file: str, epoch: int, batch=1)

# write to file)
queried_sample = np.array(self.queried_sample)
flag = queried_sample[:,0].astype(int) == epoch
if len(queried_sample) > 0:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is part of a small bug fix that I'm not totally sure about and would like a second opinion from one of the science team. When I was setting up the tests for the learn_loop function, I ran into an issue with the test data where queried_sample was empty, which caused the above line to fail. I added a check here and another place in database.py to check for an empty list before continuing, which seems ok in tandem with the sum(flag) > 0 check before writing, but I wanted to make sure I wasn't causing unexpected behavior.

@maxwest-uw maxwest-uw self-assigned this Oct 4, 2024
if is_save_photoids_to_file or is_save_snana_types:
file_name = file_name_prefix + '_' + str(iteration_step) + file_name_suffix
if config.photo_ids_to_file or config.SNANA_types:
file_name = config.photo_ids_froot + '_' + str(iteration_step) + file_name_suffix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cute - I didn't notice froot when I first looked at the LoopConfig dataclass.

Copy link
Collaborator

@drewoldag drewoldag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks like a nice clean up. Thanks for tidying up the function definitions as well - I like the one-parameter-per-line look too :)

@maxwest-uw maxwest-uw merged commit 06c1a25 into main Oct 7, 2024
7 checks passed
@maxwest-uw maxwest-uw deleted the issue/17 branch October 7, 2024 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement config-based system to define functional inputs
2 participants