Open
Description
Great work! But when using the write_selected_data.py
script, a mismatch in the number of --train_file_names
and --train_files
parameters results in an assertion failure.
Steps to Reproduce
-
Run the Data Selection Script as the README Instructions:
python3 -m less.data_selection.write_selected_data \ --target_task_names ${TARGET_TASK_NAMES} \ --train_file_names flan_v2 cot dolly oasst1 \ --train_files ../data/train/processed/dolly/dolly_data.jsonl ../data/train/processed/oasst1/oasst1_data.jsonl \ --output_path $SELECTED_DATA_OUTPUT_PATH \ --percentage 0.05
-
Observe the Assertion Failure:
The script contains the following assertion:
assert len(args.train_file_names) == len(args.train_files)
In this example,
--train_file_names
has 4 names (flan_v2
,cot
,dolly
,oasst1
), while--train_files
only provides 2 file paths (dolly_data.jsonl
andoasst1_data.jsonl
). This mismatch triggers the assertion, causing the script to terminate unexpectedly.
Metadata
Metadata
Assignees
Labels
No labels