Finetuning on concatenated datasets #165

ryspark · 2025-07-11T20:50:33Z

Mostly just applying changes from rslearn (allenai/rslearn#207)

Adds support for finetuning on multiple datasets in memory at once (though only one task at a time)
Adds support for parameter-efficient finetuning via ALPA, but this doesn't give that much of a performance boost and is not recommended

New configs in data/helios/v3_multitask and data/helios/v3_perf_benchmark are for multi-dataset training. These are the majority of the edits, plus a bunch of scripts for early evals I did (kind of messy, I don't mind deleting them since they're not really useful for anyone else). Only make_multidataset_config.py should really be used with any frequency, to create multi-dataset training configs.

Also, made some improvements to launch_finetune. It now supports a profiler flag, a do_eval flag (run on validation set and save metrics, useful for eval sweeps), and a local flag (useful for debugging finetuning in the current Beaker session). See data/helios/v3_multitask/README.md for docs on how to run a multi-dataset job.

BREAKING: When building docker images, please place rslearn and helios in ./docker_build/rslearn and ./docker_build/helios, instead of at the repository root. This is to avoid linter issues, where the linter thinks that rslearn and helios are local packages instead of standard pip installs.

There are several other changes: - Upgrade beaker-py to 2.x (queues are not supported in older version). - Update helios.Dockerfile to use pytorch 2.7.0 to reduce build time. - Update one_off_projects/convert_satlas_webmercator_to_rslearn/lib/__init__.py for new rslearn VectorFormat.encode_vector API. - Remove manage_scratch_dir_on_data_disk option since it's not needed anymore (since the Docker volumes are now on the big /data disk across all Beaker nodes).

I also updated the launcher code to accept specifying a list of configs. This enables reducing duplication between some of the config files, although makes it more complicated to start the experiments since you need to specify a list of configs to get the right combination (this is documented in the README files within each task dir in data/helios though).

yawenzzzz and others added 30 commits May 8, 2025 15:49

add git ignore

80d3b25

ruff

cf541d3

ruff

4c41530

add argument description

213903f

add the reference points

592caa2

add mangrove configs

58078eb

add configs for mangrove

5eca1d0

add scripts for worldcereal

a183ccf

remove non-cropland classes

e8ebd56

add back the window save

8db2eb1

update task name

5907d75

Merge branch 'master' of github.com:allenai/rslearn_projects

057143e

update configs to remove wrong sampler

9a49755

update configs to remove the wrong samplers

183f3c0

Merge branch 'master' of github.com:allenai/rslearn_projects

89fb8c9

remove incorrect sampler

02b975d

modify create windows

96fe57b

Merge branch 'master' of github.com:allenai/rslearn_projects

24cf8da

add docstring

79088ef

fix

0349092

Support specifying retries on the Beaker job.

487f7e8

add pastis, sentinel1/2 vessels, and sentinel2 vessel attribute

79c9eb5

some config fixes

5db2307

make v2 folder

e81e6e6

add initial config

8e7fa21

merge conflicts

2c9cc68

add kenya/nandi crop type mapping model configs

d09f631

udpate readme

a8ba43d

ryspark added 23 commits July 7, 2025 14:02

Update config to allow multi file setups

12db8fb

Remove redundant multi config key

fec1024

Remove unused config

d8188a4

Update gitignore for tmp dir

f94567b

Delete beaker_launcher.py

6b0e274

Add perf benchmark configs

904e86d

Update cfgs, specify num_workers

2b3baf4

Merge branch 'master' into ryanp/singletask

a8cd137

Merge branch 'master' into ryanp/singletask

bd5a8eb

Testing new dataloader speed

1961d4e

Rename max_num_workers to num_workers

df28469

Update configs

abe9a7a

Merge branch 'master' into ryanp/singletask

0b43e77

Add docs on multidataset

00f3fd0

[Breaking] change rslearn/helios paths in dockerfile

217e479

Fix linter issues

64bd9cd

Merge branch 'master' into ryanp/singletask

d178117

Remove lightning auto sampler

c5e689c

Add configs, update str sub

a2b92e2

Apply pr suggestions

dfac90a

Config changes

3a1020b

Fix linter issues

7cb1421

Update helios README

c2010ed

ryspark requested a review from yawenzzzz July 14, 2025 23:50

ryspark marked this pull request as ready for review July 15, 2025 00:03

Update scripts, configs

d1fb967

ryspark mentioned this pull request Jul 15, 2025

Implement attentive pooling for finetuning #168

Draft

ryspark added 3 commits July 17, 2025 15:21

Config changes

be4f4ad

Update all configs for new lr

4da4aae

Fix f string

0af13c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finetuning on concatenated datasets #165

Finetuning on concatenated datasets #165

Uh oh!

ryspark commented Jul 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Finetuning on concatenated datasets #165

Are you sure you want to change the base?

Finetuning on concatenated datasets #165

Uh oh!

Conversation

ryspark commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ryspark commented Jul 11, 2025 •

edited

Loading