RNNT predictor uses blank_id as padding_idx in nn.Embedding, but inputs are padded with pad_id = 0

In the RNN-T model implementation (e.g., [RNNTDecoder](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/modules/rnnt.py#L552)), the predictor’s embedding layer is initialized as:

`nn.Embedding(vocab_size + 1, pred_n_hidden, [padding_idx=self.blank_idx)`](as implemented  [here](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/modules/rnnt.py#L845))

Where:

`self.blank_idx = vocab_size`

However, in the [label_collate()](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/common/parts/rnn.py#L536) function and tokenizer pipeline, the padding ID used for batched label sequences is hardcoded to 0.

You can see this being used in [here](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/modules/rnnt.py#L689), where padded targets are generated with pad_id = 0.
 This creates a mismatch:

 The embedding layer is configured to ignore blank_id (vocab_size), which is never actually used in predictor inputs.
 Meanwhile, pad_id = 0 is used for padding, but it is not masked in the embedding layer — so it receives gradients and is treated as a normal token during training.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RNNT predictor uses blank_id as padding_idx in nn.Embedding, but inputs are padded with pad_id = 0 #14323

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RNNT predictor uses blank_id as padding_idx in nn.Embedding, but inputs are padded with pad_id = 0 #14323

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions