-
Notifications
You must be signed in to change notification settings - Fork 368
Open
Description
When running from promptsource.seqio_tasks import tasks
it takes a huge amount of time. One of the main reasons is this queries all dataset infos:
dataset_splits = utils.get_dataset_splits(dataset_name, subset_name) |
- One has to load ALL dataset infos as soon as one uses one task.
- Even when cached, it still queries urls to check that it didn't change. One can bypass this point by passing
HF_DATASETS_OFFLINE=1
as described in Transferpromptsource.seqio_tasks
to https://github.com/bigscience-workshop/t-zero #703 (comment)
IMO both are unnecessary and should be fixed. Is there a reasons why one cannot load seqio tasks dynamically, in the sense of fetching only what is necessary? Something along the lines of:
def add_seqio_task(task_name):
seqio.TaskRegistry.add(...)
Metadata
Metadata
Assignees
Labels
No labels