Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor time_domain_loop with new TimeDomainConfiguration #74

Merged
merged 4 commits into from
Nov 14, 2024

Conversation

maxwest-uw
Copy link
Collaborator

Change Description

A refactor of time_domain_loop, with the creation of a new TimeDomainConfiguration dataclass. This functions very similar to LoopConfiguration in #36 .

Solution Description

Created a separate dataclass for time domain to keep simplicity of use.

Code Quality

  • I have read the Contribution Guide
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

@maxwest-uw maxwest-uw changed the title squashed all commits Refactor time_domain_loop with new TimeDomainConfiguration Nov 5, 2024
Copy link

github-actions bot commented Nov 5, 2024

Before [a1b8a6f] After [a0fe4aa] Ratio Benchmark (Parameter)
154±2ms 157±2ms 1.02 benchmarks.time_learn_loop('KNN', 'UncSampling')
2.60±0.03s 2.66±0.01s 1.02 benchmarks.time_learn_loop('RandomForest', 'UncSampling')
146±6ms 147±4ms 1.01 benchmarks.time_feature_creation
2.58±0.02s 2.62±0.02s 1.01 benchmarks.time_learn_loop('RandomForest', 'RandomSampling')
197M 195M 0.99 benchmarks.peakmem_learn_loop('KNN')
188M 186M 0.99 benchmarks.peakmem_learn_loop('RandomForest')
156±1ms 155±1ms 0.99 benchmarks.time_learn_loop('KNN', 'RandomSampling')

Click here to view all benchmarks.

Copy link
Collaborator

@drewoldag drewoldag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks fine to me. I left a couple comments, but neither are blocking.

Comment on lines 112 to 146
def __post_init__(self):
# file checking
if not path.isdir(self.path_to_features_dir):
raise ValueError("`path_to_features` must be an existing directory.")

# check strategy
if self.strategy not in VALID_STRATEGIES:
raise ValueError(f"{self.strategy} is not a valid strategy.")
if "QBD" in self.strategy and not self.clf_bootstrap:
raise ValueError("Bootstrap must be true when using disagreement strategy")

for key in self.path_to_ini_files.keys():
if not path.isfile(self.path_to_ini_files[key]):
raise ValueError(f"{key} does not point to existing file.")

def to_dict(self):
"""converts configurations elements into a dict."""
return asdict(self)

@classmethod
def from_dict(cls, lc_dict):
"""creates a `LoopConfiguration` instance from a dict."""
return cls(**lc_dict)

def to_json(self, file_path):
"""write out the `LoopConfiguration` as a json file."""
with open(file_path, 'w') as fp:
json.dump(self.to_dict(), fp)

@classmethod
def from_json(cls, file_path):
"""read a `LoopConfiguration` generated json file and instantiate."""
with open(file_path) as fp:
lc_dict = json.load(fp)
return cls(**lc_dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods seem fairly similar to the ones in the LoopConfiguration data class. I can't recall the details of inheritance for Python dataclasses, but would it be possible/worth pushing these up into a parent class?

Comment on lines 671 to +711
for metadata_value in light_curve_data.train_metadata[id_key_name].values:
next_day_pool_metadata = next_day_data.pool_metadata[id_key_name].values
if metadata_value in next_day_pool_metadata:
next_day_pool_metadata_indices = list(
next_day_pool_metadata).index(metadata_value)
next_day_pool_metadata_indices = list(next_day_pool_metadata).index(metadata_value)
if metadata_value not in light_curve_train_ids:
light_curve_train_metadata = light_curve_data.train_metadata[
id_key_name].values
light_curve_train_metadata = light_curve_data.train_metadata[id_key_name].values
light_curve_data = _remove_old_training_features(
light_curve_data, light_curve_train_metadata,
metadata_value)
light_curve_data,
light_curve_train_metadata,
metadata_value
)
if light_curve_data.queryable_ids.shape[0] > 0:
light_curve_data = _update_queried_sample(
light_curve_data, next_day_data, id_key_name,
metadata_value)
light_curve_data,
next_day_data,
id_key_name,
metadata_value
)
light_curve_data = _update_training_data_with_new_features(
light_curve_data, next_day_data, metadata_value, id_key_name)
light_curve_data,
next_day_data,
metadata_value,
id_key_name
)
next_day_data = _update_next_day_pool_data(
next_day_data, next_day_pool_metadata_indices)
next_day_data,
next_day_pool_metadata_indices
)
next_day_data = _update_next_day_val_and_test_data(
next_day_data, metadata_value, id_key_name)
next_day_data,
metadata_value,
id_key_name
)
light_curve_data = _update_light_curve_data_for_next_epoch(
light_curve_data, next_day_data, canonical_data, is_queryable, strategy,
is_separate_files)
light_curve_data,
next_day_data,
canonical_data,
config.queryable,
config.strategy,
config.sep_files
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that you didn't introduce this, but oh my, this looks like a place where lots of bugs can hide. 😬

@maxwest-uw maxwest-uw merged commit 181e14f into main Nov 14, 2024
8 checks passed
@maxwest-uw maxwest-uw deleted the time_domain_loop_config branch November 14, 2024 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants