Skip to content

eos_token as pad_token Causes Incorrect Attention Masking in Multi-Turn Dialogue #128

Open
@Chennz1

Description

@Chennz1

Hi,

I've noticed that the tokenizer is configured with tokenizer.pad_token = tokenizer.eos_token.

This poses a significant problem for models like Llama 3 that use eos_token (e.g., <|eot_id|>) as a semantic delimiter to separate turns in chat templates. During batch processing of multi-turn dialogues, the real eos_token marking the end of a turn gets incorrectly masked out in the attention_mask.

Problem:
When padding shorter sequences in a batch, the tokenizer's logic sets the attention_mask to False for all pad_token_ids. Because this is the same as the eos_token_id, it incorrectly masks the eos_tokens that are part of the actual input, not just the padding. This can result in an attention_mask like [...True, True, False, True...], where the False corresponds to a meaningful eos_token.

Suggested Solution:
Would you consider adding a new, distinct padding token (e.g., ) or use different pad_token defined by each model (such as <|finetune_right_pad_id|> in llama-3) to the tokenizer and resizing the model's token embeddings? This would resolve the ambiguity and ensure correct attention masking during training.

Thanks for your great work on this project!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions