Skip to content

RNNT predictor uses blank_id as padding_idx in nn.Embedding, but inputs are padded with pad_id = 0 #14323

@arishov1

Description

@arishov1

In the RNN-T model implementation (e.g., RNNTDecoder), the predictor’s embedding layer is initialized as:

nn.Embedding(vocab_size + 1, pred_n_hidden, [padding_idx=self.blank_idx)](as implemented here)

Where:

self.blank_idx = vocab_size

However, in the label_collate() function and tokenizer pipeline, the padding ID used for batched label sequences is hardcoded to 0.

You can see this being used in here, where padded targets are generated with pad_id = 0.
This creates a mismatch:

The embedding layer is configured to ignore blank_id (vocab_size), which is never actually used in predictor inputs.
Meanwhile, pad_id = 0 is used for padding, but it is not masked in the embedding layer — so it receives gradients and is treated as a normal token during training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions