Real-Time Speech-to-Text Translation Support #58

hu-ke · 2025-01-24T03:39:41Z

Description of the feature request:

Instead of waiting for a turn of speech to complete (VAD mode), would it be possible to stream the generated results in real-time?

What problem are you trying to solve with this feature?

Suppose I am currently in a Japanese interview, but my Japanese skills are not very strong. I would like to build a app with the Gemini Multimodal API to assist me with real-time speech-to-text translation.

Any other information you'd like to share?

No response

The text was updated successfully, but these errors were encountered:

ViaAnthroposBenevolentia · 2025-02-03T03:08:04Z

Currently, the easiest (and free) way to do this:

Get a free ($200 credit) API from Deepgram;
Establish a web socket connection to wss://api.deepgram.com/v1/listen;
Send the base64 audio from Gemini to the web socket;
Get real-time transcript from Deepgram.

OptionIA · 2025-04-21T23:36:28Z

or for free, but the price is the latency, make a function calling with 2 parameters Input: output, and add in the model instruction that the model in all of their ressponses must use the function call. then, add a system that get it and print it

OptionIA · 2025-04-22T00:29:29Z

well, they early added the native function, but there is no documentation aviable (or easy to found)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real-Time Speech-to-Text Translation Support #58

Real-Time Speech-to-Text Translation Support #58

hu-ke commented Jan 24, 2025

ViaAnthroposBenevolentia commented Feb 3, 2025

OptionIA commented Apr 21, 2025

OptionIA commented Apr 22, 2025 •

edited

Loading

Real-Time Speech-to-Text Translation Support #58

Real-Time Speech-to-Text Translation Support #58

Comments

hu-ke commented Jan 24, 2025

Description of the feature request:

What problem are you trying to solve with this feature?

Any other information you'd like to share?

ViaAnthroposBenevolentia commented Feb 3, 2025

OptionIA commented Apr 21, 2025

OptionIA commented Apr 22, 2025 • edited Loading

OptionIA commented Apr 22, 2025 •

edited

Loading