eos_token as pad_token Causes Incorrect Attention Masking in Multi-Turn Dialogue

Hi,

I've noticed that the tokenizer is configured with `tokenizer.pad_token = tokenizer.eos_token.`

This poses a significant problem for models like Llama 3 that use eos_token (e.g., `<|eot_id|>`) as a semantic delimiter to separate turns in chat templates. During batch processing of multi-turn dialogues, the real eos_token marking the end of a turn gets incorrectly masked out in the attention_mask.

Problem:
When padding shorter sequences in a batch, the tokenizer's logic sets the attention_mask to False for all pad_token_ids. Because this is the same as the eos_token_id, it incorrectly masks the eos_tokens that are part of the actual input, not just the padding. This can result in an attention_mask like [...True, True, **False**, True...], where the False corresponds to a meaningful eos_token.


Suggested Solution:
Would you consider adding a new, distinct padding token (e.g., <pad>) or use different pad_token defined by each model (such as `<|finetune_right_pad_id|>` in llama-3) to the tokenizer and resizing the model's token embeddings? This would resolve the ambiguity and ensure correct attention masking during training.

Thanks for your great work on this project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

eos_token as pad_token Causes Incorrect Attention Masking in Multi-Turn Dialogue #128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

eos_token as pad_token Causes Incorrect Attention Masking in Multi-Turn Dialogue #128

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions