Tuning: Overall speed of Onju Voice assist pipeline executions #103

rmeissn · 2025-01-03T14:48:56Z

rmeissn
Jan 3, 2025

I've assembled an Onju Voice for a child as an Alexa replacement. Overall, it works quite well, but it seems the child is a bit disappointed by it answering so slow in comparison to an Alexa.

My current pipeline is:

STT: OpenAI Whisper - https://github.com/fabio-garavini/ha-openai-whisper-stt-api
Conversation Agent: OpenAI gpt-4o-mini (with local processing preferred) - official plugin
TTS: Elevenlabs Turbo v2.5 - official plugin, free quota

A random run with some command looks like:

Do you got any tips on speeding up these pipeline steps - in particular STT and Natural Language Processing, which seem to cause the delay in playing an answer.
Maybe you're even using other upstream providers, which may be faster.

tetele · 2025-01-03T15:39:41Z

tetele
Jan 3, 2025
Maintainer

Those results sound pretty reasonable, tbh. Not sure you can get much better performance using a cloud-based LLM conversation agent.

At some point, streaming will probably be supported over the Wyoming protocol. Then the perceived performance may visibly increase, but I can't think of any other major developments until then.

1 reply

rmeissn Jan 5, 2025
Author

Thank you for the classification (on runtimes).
Let's see what Home Assistant Community and Nabu Casa are coming up with in the future =)

TheStigh · 2025-01-03T15:40:06Z

TheStigh
Jan 3, 2025

First - if you have a good GPU in your household you could move Whisper there - incredible results even with a RTX1650 and up. Next, if you do not, you can use Home Assistant Cloud for STT, it is pretty fast actually. ElevenLabs is of course slower compared to run a local Piper TTS.

If speed is the priority;

STT: Whisper on GPU - OR - Home Assistant Cloud
Conversation Agent: for proper answers, use Open AI
TTS: Piper, get the voice from Thorsten-Voice (I see you're in Germany)

If quality is the priority (TTS), use ElevenLans Turbo 2.5 (as you already are doing).

1 reply

rmeissn Jan 5, 2025
Author

Many thanks for your suggestions!
I'm currently unwilling to buy high-class hardware (a full tower pc with a decent GPU) due to the hardware price and energy consumption of running it 24/7. But maybe an RPi5 with the recently released NPU add-on might be something I may look into in the future.

I've tested a bit and whisper on GroqCloud seems to be a little faster than whisper by OpenAI (~0.93s to 1.23s for my test-case). But i didn't run these tests scientifically.

I guess gpt-4o-mini is already the fastest model by OpenAI. I didn't test Anthropic or Mistral so far. Maybe someone else did?

I've switched to Eleven Labs Eleven Flash v2.5 as Turbo v2.5 is already deprecated. I also looked at Piper and Thorsten's Voice, but it seems like a downgrade compared to Eleven Labs voices in terms of quality. Also, it's the only decent German voice, but far away from the child's choice of voice from Eleven labs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuning: Overall speed of Onju Voice assist pipeline executions #103

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Tuning: Overall speed of Onju Voice assist pipeline executions #103

rmeissn Jan 3, 2025

Replies: 2 comments · 2 replies

tetele Jan 3, 2025 Maintainer

rmeissn Jan 5, 2025 Author

TheStigh Jan 3, 2025

rmeissn Jan 5, 2025 Author

rmeissn
Jan 3, 2025

Replies: 2 comments 2 replies

tetele
Jan 3, 2025
Maintainer

rmeissn Jan 5, 2025
Author

TheStigh
Jan 3, 2025

rmeissn Jan 5, 2025
Author