Skip to content

Commit ceba38f

Browse files
authored
minor fix to nemo1 sequence packing script (#12486)
Signed-off-by: ashors1 <[email protected]>
1 parent a903e1a commit ceba38f

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

scripts/nlp_language_modeling/prepare_packed_ft_dataset.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ def main(cfg: 'DictConfig') -> None:
205205
dataset, tokenizer = tokenize_dataset(cfg)
206206
sequences, histogram = create_hist(dataset, cfg.model.data.train_ds.max_seq_length)
207207
for pack_size in args.pack_sizes:
208-
assignments = create_packing_strategy(histogram, pack_size, args.packing_algorithm)
208+
assignments, _ = create_packing_strategy(histogram, pack_size, args.packing_algorithm)
209209
output_data = fill_packing_strategy(assignments, sequences, pack_size, tokenizer.eos_id)
210210

211211
# save output data

0 commit comments

Comments
 (0)