Skip to content

transcription vs translation mode for multilingual models? #75

@pguridi

Description

@pguridi

Hello,

I'm using WhisperKitAndroid with the OPENAI_BASE multilingual model and it's working great for language detection. However, I'm running into an issue where it always translates non-English audio to English instead of transcribing it in the original language.

For example, when I speak in Spanish, I get:

  • Language detected: es ✅
  • Task mode: <|translate|> ❌
  • Output: English translation instead of Spanish transcription

I'd like to get Spanish text when I speak Spanish, French text when I speak French, etc.

Looking at the Builder API, I don't see options to configure the task mode (transcribe vs translate) or language settings. The WhisperAX sample app has selectedTask and selectedLanguage state but they don't seem to be used in the actual WhisperKit configuration.

Is there a way to:

  1. Force transcription mode instead of translation mode?
  2. Set language to null for auto-detection with transcription?
  3. Configure this through the Builder API?

Thanks for the great work on this library!

Environment:

  • WhisperKitAndroid: 0.3.2
  • Model: OPENAI_BASE (multilingual)
  • Android API: 34

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions