Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT]: Specify whisper transcription language #2928

Closed
khalilxg opened this issue Jan 2, 2025 · 7 comments
Closed

[FEAT]: Specify whisper transcription language #2928

khalilxg opened this issue Jan 2, 2025 · 7 comments
Labels
enhancement New feature or request feature request

Comments

@khalilxg
Copy link

khalilxg commented Jan 2, 2025

How are you running AnythingLLM?

Docker (local)

What happened?

I'm encountering an issue with the Whisper integration in AnythingLLM. Despite setting the language parameter to "ar" in the OpenAI Whisper API, the transcription often returns transliterated Arabic (Arabic words in Latin script) instead of Arabic script. I've tried various methods to address this, but none have worked so far.

Expected Behavior: The transcription should return Arabic text in Arabic script (e.g., "مرحبا" for "hello").

Actual Behavior: The transcription returns transliterated Arabic in Latin script (e.g., "marhaban" for "hello").

Environment:

AnythingLLM Version: docker latest
Operating System: debian
Additional Context: I've followed the Whisper documentation and confirmed that the language parameter is set correctly. This issue might be related to how the API processes Arabic audio or interprets the transcription language.

Request for Resolution: Please provide guidance or a workaround to force Whisper to transcribe Arabic speech into Arabic script. If this is a limitation of the current implementation, a feature to enforce script-based output would be appreciated.

Are there known steps to reproduce?

Steps to Reproduce:

Provide an Arabic audio file.
Configure the Whisper transcription with the following parameters:
model: "whisper-1"
language: "ar"
temperature: 0
Check the transcription output.

@khalilxg khalilxg added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Jan 2, 2025
@timothycarambat timothycarambat added enhancement New feature or request feature request and removed possible bug Bug was reported but is not confirmed or is unable to be replicated. labels Jan 2, 2025
@timothycarambat timothycarambat changed the title [BUG]: whisper transcription return only in latin [FEAT]: Specify whisper transcription language Jan 2, 2025
@timothycarambat
Copy link
Member

timothycarambat commented Jan 2, 2025

needs appropriate supported language here:

const { text } = await transcriber(audioData, {

language: "en", // ISO-code

@khalilxg
Copy link
Author

khalilxg commented Jan 2, 2025

im using openAi whisper's api,

target language is arabic, first i've just added language variable but still got same issue,: so i've updated some lines in
anything-llm-master/collector/utils/WhisperProviders/OpenAiWhisper.js
to

`const fs = require("fs");

class OpenAiWhisper {
constructor({ options }) {
const { OpenAI: OpenAIApi } = require("openai");
if (!options.openAiKey) throw new Error("No OpenAI API key was set.");

this.openai = new OpenAIApi({
  apiKey: options.openAiKey,
});
this.model = "whisper-1";
this.temperature = 0;
this.#log("Initialized.");
this.language = "ar";
this.task = "transcribe";

}

#log(text, ...args) {
console.log(\x1b[32m[OpenAiWhisper]\x1b[0m ${text}, ...args);
}

async processFile(fullFilePath) {
return await this.openai.audio.transcriptions
.create({
file: fs.createReadStream(fullFilePath),
model: this.model,
prompt: "مرحبًا، اسمي جو، متحدث أصلي للغة العربية، وسأجري اليوم محادثة باللغة العربية حول موضوع قد تجده مثيرًا للاهتمام للغاية.",
temperature: this.temperature,
language: this.language,
task: this.task,
})
.then((response) => {
if (!response) {
return {
content: "",
error: "No content was able to be transcribed.",
};
}

    return { content: response.text, error: null };
  })
  .catch((error) => {
    this.#log(
      `Could not get any response from openai whisper`,
      error.message
    );
    return { content: "", error: error.message };
  });

}
}

module.exports = {
OpenAiWhisper,
};
`
adding prompt, language, and task, and still got same issue: input voice in pure arabic accent and i get instant latin alphabets

@timothycarambat
Copy link
Member

In that case: https://platform.openai.com/docs/guides/speech-to-text#prompting

Some languages can be written in different ways, such as simplified or traditional Chinese. The model might not always use the writing style that you want for your transcript by default. You can improve this by using a prompt in your preferred writing style.

The language helps the model determine the input language - not the output. If you add a prompt that is written in Arabic to then specify to output the translation in Arabic that may help but it is not foolproof.

Googling this shows this issue is pretty common among Whisper model users. Most wind up going to post-processing the output with an LLM for translation. So that is the current state of whisper 🤷

@khalilxg
Copy link
Author

@timothycarambat librechat project is implementing a transcription voice feature, where user selects language and country, and the speech is transcribed into that target language. I see that voice interfaces in rag is awesome for b2c apps, ill try implement this feature hopefully itll not take too long 🚀

@khalilxg
Copy link
Author

Also after extensive testing across various open-source projects, AnythingLLM stands out with the best RAG performance. This is primarily due to its integration with Cohere, which offers the most advanced embedding models. Other projects lack this seamless integration, and attempts to incorporate Cohere often result in project instability.

@timothycarambat
Copy link
Member

@khalilxg Have you tried Cohere + Reranking in the workspace? Might get even better results.

@khalilxg
Copy link
Author

Good job !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request
Projects
None yet
Development

No branches or pull requests

2 participants