Scaling and Cost Question

### Your question

Hi There everyone. Trying to work out how much using this will actually cost.

I see Open AI is $80 for 1 million output tokens.

Gemmini seems a bit better at $12. Now trying to work out how much actual voice output that is. And is this for the session (so the clock ticks even if its not outputting anything? 

Also got this from Gemini and want to know if this looks correct and what else there is to consider:

You've already got the key information to calculate this! Let's re-state it and then figure out how many minutes $1 will get you.

Key Information:

1 second of audio = 32 tokens (this is a standard conversion for Gemini models)

1 minute of audio = 1,920 tokens (32 tokens/second * 60 seconds/minute)

Given your pricing:

Input audio cost: $3 per 1M input audio tokens

Output audio cost: $12 per 1M output audio tokens

How many minutes will $1 last for Input Audio?
Tokens you get for $1 (Input):
You pay $3 for 1,000,000 tokens.
So, for $1, you get (1,000,000 tokens / $3) = 333,333.33 tokens (approximately)

Minutes of input audio for those tokens:
Since 1 minute = 1,920 tokens, then:
333,333.33 tokens / 1,920 tokens/minute = 173.61 minutes (approximately)

So, $1 will get you about 173.61 minutes of input audio with Gemini 2.5 Flash Live API.

How many minutes will $1 last for Output Audio?
Tokens you get for $1 (Output):
You pay $12 for 1,000,000 tokens.
So, for $1, you get (1,000,000 tokens / $12) = 83,333.33 tokens (approximately)

Minutes of output audio for those tokens:
Since 1 minute = 1,920 tokens, then:
83,333.33 tokens / 1,920 tokens/minute = 43.40 minutes (approximately)

So, $1 will get you about 43.40 minutes of output audio with Gemini 2.5 Flash Live API.

Summary for $1:

Input Audio: Approximately 173.61 minutes

Output Audio: Approximately 43.40 minutes

This clearly shows that generating audio output is significantly more expensive per minute than processing audio input, which is a common characteristic of these types of AI services due to the computational resources required for high-quality audio synthesis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scaling and Cost Question #510

Your question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scaling and Cost Question #510

Description

Your question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions