Skip to content

Finetuning on concatenated datasets #165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 98 commits into
base: master
Choose a base branch
from
Open

Finetuning on concatenated datasets #165

wants to merge 98 commits into from

Conversation

ryspark
Copy link
Contributor

@ryspark ryspark commented Jul 11, 2025

Mostly just applying changes from rslearn (allenai/rslearn#207)

  • Adds support for finetuning on multiple datasets in memory at once (though only one task at a time)
  • Adds support for parameter-efficient finetuning via ALPA, but this doesn't give that much of a performance boost and is not recommended

New configs in data/helios/v3_multitask and data/helios/v3_perf_benchmark are for multi-dataset training. These are the majority of the edits, plus a bunch of scripts for early evals I did (kind of messy, I don't mind deleting them since they're not really useful for anyone else). Only make_multidataset_config.py should really be used with any frequency, to create multi-dataset training configs.

Also, made some improvements to launch_finetune. It now supports a profiler flag, a do_eval flag (run on validation set and save metrics, useful for eval sweeps), and a local flag (useful for debugging finetuning in the current Beaker session). See data/helios/v3_multitask/README.md for docs on how to run a multi-dataset job.

BREAKING: When building docker images, please place rslearn and helios in ./docker_build/rslearn and ./docker_build/helios, instead of at the repository root. This is to avoid linter issues, where the linter thinks that rslearn and helios are local packages instead of standard pip installs.

yawenzzzz and others added 30 commits May 8, 2025 15:49
There are several other changes:
- Upgrade beaker-py to 2.x (queues are not supported in older version).
- Update helios.Dockerfile to use pytorch 2.7.0 to reduce build time.
- Update one_off_projects/convert_satlas_webmercator_to_rslearn/lib/__init__.py for new rslearn VectorFormat.encode_vector API.
- Remove manage_scratch_dir_on_data_disk option since it's not needed anymore (since the Docker volumes are now on the big /data disk across all Beaker nodes).
I also updated the launcher code to accept specifying a list of configs. This enables
reducing duplication between some of the config files, although makes it more complicated
to start the experiments since you need to specify a list of configs to get the right
combination (this is documented in the README files within each task dir in data/helios
though).
@ryspark ryspark requested a review from yawenzzzz July 14, 2025 23:50
@ryspark ryspark marked this pull request as ready for review July 15, 2025 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants