Open
Description
When running transcription using the Whisper model, several deprecation warnings are displayed:
- The input name inputs is deprecated—should use input_features instead.
- A bug fix ([Whisper] Refactor forced_decoder_ids & prompt ids huggingface/transformers#28687) now causes multilingual Whisper to default to language detection followed by transcription instead of translation to English.
- Passing a tuple of past_key_values is deprecated and will be removed in Transformers v4.43.0. The recommendation is to use an instance of EncoderDecoderCache (e.g., past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)).
- The attention mask is not set because the pad token is the same as the eos token. Explicitly passing an attention_mask is recommended to avoid unexpected behavior.

Next Steps:
- Should we update our implementation (e.g., switching to input_features, setting an explicit attention mask, and handling past_key_values accordingly) to future-proof the code and avoid these warnings?
Metadata
Metadata
Assignees
Labels
No labels