Skip to content

Dont require unused dataloaders #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

scottcanoe
Copy link
Contributor

This PR removes the need to specify unused data loader classes and args in experiment configs.

Motivation

Monty experiment configs currently require specifying both training and eval data loaders, regardless of whether both are needed. This means we end up with loads of lines like

some_pretraining_config = dict(
    .
    .
    .
    eval_dataloader_class=ED.InformedEnvironmentDataLoader,  # just placeholder
    eval_dataloader_args=get_env_dataloader_per_object_by_idx(start=0, stop=1),
)

or

some_eval_config = dict(
    .
    .
    .
    # required but unused
    train_dataloader_class=ED.InformedEnvironmentDataLoader,
    train_dataloader_args=get_env_dataloader_per_object_by_idx(start=0, stop=10),
)

This adds clutter and may also be misleading, especially to newcomers. This is a small PR that improves quality of life and allows us to write cleaner, less confusing experiment configs.

Changes

  • MontyExperiment.load_dataset_and_dataloaders was modified such that an eval or train dataloaders will only be initialized if the experiment config requires it.
  • This change was tested by removing unused data loaders from benchmark configs. As such, eval dataloader configs have been removed from benchmark pretraining experiments, and train dataloader configs have been removed benchmark from eval experiments. I reran all pretraining experiments, a couple of experiments from each of the long- and short- YCB experiment list, all unsupervised experiments, and a few monty-meets-world experiments. As expected, all configs run as normal.

Remove eval dataloaders from configs
Remove train dataloaders (except for unsupervised experiments).
Remove unneeded import
Remove unneeded import
Remove train dataloader and unneeded import
Remove train dataloaders
@scottcanoe scottcanoe added the enhancement New feature or request label Feb 13, 2025
@scottcanoe scottcanoe added triaged This issue or pull request was triaged and removed enhancement New feature or request labels Feb 13, 2025
Copy link
Contributor

@tristanls tristanls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a reviewer, but I was curious, and this looks great, thank you.

suggestion: I suggest this should be a feat: commit (which introduces a new feature to the codebase, the feature of not having to specify extraneous things; you can make it a feat: commit by prefixing the commit message with feat: ; more on this suggested commit prefix here).

@scottcanoe
Copy link
Contributor Author

suggestion: I suggest this should be a feat: commit (which introduces a new feature to the codebase, the feature of not having to specify extraneous things; you can make it a feat: commit by prefixing the commit message with feat: ; more on this suggested commit prefix here).

Thanks @tristanls! I wasn't aware of this convention. If I'm understanding it correctly, only the commit that introduced the change (in MontyExperiment.load_dataset_and_dataloaders) would get the feat: prefix, while the the commits that make use of it (i.e., changes to benchmark configs) would be left without prefix. Is that right?

Copy link
Contributor

@nielsleadholm nielsleadholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful, thanks for making this change!

@scottcanoe scottcanoe merged commit bb942fc into thousandbrainsproject:main Feb 14, 2025
13 checks passed
@scottcanoe scottcanoe deleted the dont_require_unused_dataloaders branch February 14, 2025 15:06
@tristanls
Copy link
Contributor

tristanls commented Feb 14, 2025

Thanks @tristanls! I wasn't aware of this convention. If I'm understanding it correctly, only the commit that introduced the change (in MontyExperiment.load_dataset_and_dataloaders) would get the feat: prefix, while the the commits that make use of it (i.e., changes to benchmark configs) would be left without prefix. Is that right?

@scottcanoe, the convention is a suggestion. We don't enforce it here. But, if we were to follow it, then every commit would get a prefix. For a commit like this, out of fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: I think the most likely candidates would be: feat:, chore:, or refactor:. chore: is along the lines of maintaining dependency versions, updating tooling functionality. That doesn't seem to fit here, so we'd be left with feat: or refactor:. refactor: is typically reserved for a code change that neither fixes a bug nor adds a feature. Now, the reason I don't think it is refactor: is because this pull request adds a feature. I don't think of features as code to be written. I think of features from the user's perspective; the user's experience changed due to this code. So, this fits into a feature because a previously required parameter is now optional (the unused config). If we were making announcements, we would announce that this parameter is no longer required for our users. This is why I suggested the feat: prefix. There's also a notion of breaking changes, which adds ! to the prefix, but this is not a breaking change.

@vkakerbeck
Copy link
Contributor

I know it's already been merged but just wanted to say this is a really nice change! Thanks for adding :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged This issue or pull request was triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants