Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support masking of different and repeating roles DataCollatorForCompletionOnlyLM #3223

Open
Kirili4ik opened this issue Apr 3, 2025 · 1 comment · May be fixed by #3224
Open

Support masking of different and repeating roles DataCollatorForCompletionOnlyLM #3223

Kirili4ik opened this issue Apr 3, 2025 · 1 comment · May be fixed by #3224
Labels
✨ enhancement New feature or request 🏋 SFT Related to SFT

Comments

@Kirili4ik
Copy link

Feature request

Now DataCollatorForCompletionOnlyLM assumes that there is the same number of Questions and Answers and they go strictly one after another. I want this class to support all the variants of QAAQA, QAQQA and moreover support multiple instruction_template (e.g. "user" and "tool")
There are 2 problems with implementation: 1 arises when using tools or anything more then Question or Answer templates; 2 arises when your sequence is not strictly QAQAQA (e.g. QAAQA or QAQQA)

Motivation

Fine-tuning models on agents' trajectories is becoming more and more popular, yet there is no simple tokenizators even for well-known and industry-standard formats such as working with OpenAI's format (see here for Tool Calling)
Now simple things like QAAQA or QAQQA are not supported as well as any use of tools/funciton calls/etc.
It was also mentioned a few times already - #1994 and #2545 with a few people asking for updates

Your contribution

I have already changed the masking algorithm for local use, so I will try to make a PR for that today or this later week. I am not sure tho about the speed of that so it will need to be checked.
1st problem would only be fixed if we allow it to work with lists of different Qs (question starting templates) e.g. [<|im_start|>tool", "<|im_start|>user]
2nd problem would only be fixed if we change the masking algorithm (e.g. mask all and unmask between A&Q, instead of mask everything between Q&A)

@github-actions github-actions bot added ✨ enhancement New feature or request 🏋 SFT Related to SFT labels Apr 3, 2025
Kirili4ik added a commit to Kirili4ik/trl that referenced this issue Apr 3, 2025
…ForCompletionOnlyLM (huggingface#3223)

Refactors the masking logic in `DataCollatorForCompletionOnlyLM` to correctly handle conversations with multiple instruction roles (e.g., user, tool) and consecutive assistant turns, enabling its use for more complex dialogue formats like agent trajectories.

Previously, the collator assumed a strict alternation of a single instruction template and a response template (e.g., User -> Assistant). This failed for:
1.  Datasets with multiple instruction roles (e.g., user prompts and tool calls).
2.  Sequences with consecutive assistant messages (e.g., Assistant -> Assistant).

This commit addresses these limitations:
- Updates `__init__` to accept a list of strings or pre-tokenized IDs for `instruction_template`, allowing multiple distinct instruction roles.
- Rewrites the core masking logic in `torch_call`:
    - It now identifies all occurrences of response and all specified instruction templates.
    - For each assistant response, it unmasks tokens from the end of its template up to the beginning of the *next* instruction template or the sequence end.
    - Correctly handles consecutive assistant turns by masking the template tokens of subsequent responses while unmasking their content.
- Adds comprehensive unit tests (`test_masking_*`) covering multi-role scenarios, consecutive assistant messages, left-padding, and initialization with tokenized templates.

This allows `DataCollatorForCompletionOnlyLM` to process conversational data commonly found in ChatML formats and agent fine-tuning datasets.

Related: huggingface#1994, huggingface#2545
@Kirili4ik
Copy link
Author

See pull request #3224

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request 🏋 SFT Related to SFT
Projects
None yet
1 participant