Skip to content

Scaling and Cost Question #510

@blackqubit-com

Description

@blackqubit-com

Your question

Hi There everyone. Trying to work out how much using this will actually cost.

I see Open AI is $80 for 1 million output tokens.

Gemmini seems a bit better at $12. Now trying to work out how much actual voice output that is. And is this for the session (so the clock ticks even if its not outputting anything?

Also got this from Gemini and want to know if this looks correct and what else there is to consider:

You've already got the key information to calculate this! Let's re-state it and then figure out how many minutes $1 will get you.

Key Information:

1 second of audio = 32 tokens (this is a standard conversion for Gemini models)

1 minute of audio = 1,920 tokens (32 tokens/second * 60 seconds/minute)

Given your pricing:

Input audio cost: $3 per 1M input audio tokens

Output audio cost: $12 per 1M output audio tokens

How many minutes will $1 last for Input Audio?
Tokens you get for $1 (Input):
You pay $3 for 1,000,000 tokens.
So, for $1, you get (1,000,000 tokens / $3) = 333,333.33 tokens (approximately)

Minutes of input audio for those tokens:
Since 1 minute = 1,920 tokens, then:
333,333.33 tokens / 1,920 tokens/minute = 173.61 minutes (approximately)

So, $1 will get you about 173.61 minutes of input audio with Gemini 2.5 Flash Live API.

How many minutes will $1 last for Output Audio?
Tokens you get for $1 (Output):
You pay $12 for 1,000,000 tokens.
So, for $1, you get (1,000,000 tokens / $12) = 83,333.33 tokens (approximately)

Minutes of output audio for those tokens:
Since 1 minute = 1,920 tokens, then:
83,333.33 tokens / 1,920 tokens/minute = 43.40 minutes (approximately)

So, $1 will get you about 43.40 minutes of output audio with Gemini 2.5 Flash Live API.

Summary for $1:

Input Audio: Approximately 173.61 minutes

Output Audio: Approximately 43.40 minutes

This clearly shows that generating audio output is significantly more expensive per minute than processing audio input, which is a common characteristic of these types of AI services due to the computational resources required for high-quality audio synthesis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions