-
-
Notifications
You must be signed in to change notification settings - Fork 273
[UI] Add voice dialog #2285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[UI] Add voice dialog #2285
Conversation
89bd614 to
355c5fc
Compare
#3915 Bundle Size — 12.33MiB (+0.2%).56567a8(current) vs 967c616 main#3911(baseline) Warning Bundle contains 2 duplicate packages – View duplicate packages Bundle metrics
Bundle size by type
Bundle analysis report Branch GiviMAD:feature/voice_dialog Project dashboard Generated by RelativeCI Documentation Report issue |
355c5fc to
0091917
Compare
ce0a9e6 to
25d363c
Compare
Signed-off-by: Miguel Álvarez <[email protected]>
da88089 to
23126c5
Compare
Signed-off-by: Miguel Álvarez <[email protected]>
550622d to
ffb3b9d
Compare
|
Hello @florian-h05. After the core PR has been merged I've updated the branch to the recent web-ui changes. I have added some descriptions to the main comment about the current state of the PR. Do you think the PR is ok for a first version? Let me know want you think when you have a moment. In case you want to test it:
|
|
@GiviMAD excited to try this out as well, i'll give it a go today. |
Nice to hear that it will be good to have it tested on different devices. For testing, once the connection is done, the openHAB cli can be used to record an audio and play it: openhab> openhab:audio sinks
* PCM Audio WebSocket (ui-77-80) (pcm::ui-77-80::sink)
System Speaker (enhancedjavasound)
Web Audio (webaudio)
openhab> openhab:audio sources
* PCM Audio WebSocket (ui-77-80) (pcm::ui-77-80::source)
System Microphone (javasound)
openhab> openhab:audio record pcm::ui-77-80::source 5 test_audio.wav
Recording completed
openhab> openhab:audio play pcm::ui-77-80::sink test_audio.wav |
|
Also for using it over http in other domains than localhost you need to go to |
Signed-off-by: Miguel Álvarez <[email protected]>
|
Ok, i have upgraded my build of openHAB with the latest nightly docker image which i think has the right parts: I have whisper and piper configured. I also have this turned on in the Main UI (which i updated with this PR)
I'm connecting to OH using SSL and a real lets encrypt cert, but the mic is disabled
and i don't think i see anything in the logs. Not sure if i'm missing a step? I'll try debugging a bit my self tonight and tomorrow when i have a free moment, but maybe there's something obvious @GiviMAD you can suggest? |
|
Yes, I tested it last night over https and it seems to be a problem with the content security policy, I think because of the way the worklet code is packaged, I'll try to solve it in the afternoon. |
Signed-off-by: Miguel Álvarez <[email protected]>
|
The problem with the content security policy seems to be solved by the last commit, @digitaldan let me know if it works for you now. |
|
Excellent, i just deployed and did a real quick test an now audio is working 👍 I'll spend some time this afternoon with it. Thanks! |
|
@GiviMAD excellent work, its been working great all morning, can't wait to play with this more. I think its about time to start looking at building a "real" AI/LLM human language interpreter for our back end that can act as a Alexa/Siri/Google replacement. We have a ChatGPT binding, but its HLI functionality is very limited. I was planning on writing this a year or more ago, but got wrapped up in Matter and decided that was a better use of my time for openHAB. The other, maybe larger deterrent was that all the end-to-end voice plumbing was very overwhelming to tackle at the time......... which is exactly what you have done, and that was really the hardest part, bravo ! |
Thank you for giving it a try, nice to know it's working correctly. One question, do you think it could be better to only load the worker and connect to the websocket by clicking in the disabled mic icon instead of doing it on the first user event in any part of the screen as is done right know? Because I did it this way but I think now it is useless to setup all of that if you are not going to use it. Maybe I should change the behavior or allow to switch it from the options.
There is a issue created for adding a chat to the WebUI. I hope I have explained it correctly. I think it could be a good start point.
Thank you! |
I think at a minimum, if the first thing the user does is click on the mic, it should connect and then actually activate the mic, right now i think you have to click twice if its the first thing interacted with? (once to activate, then again to start the stream)
So i think you have definitely touched on part of this, and to be fair i have not looked at the interpreter framework to really understand whats built in right now, but i was thinking this LLM HLI would support:
I need to spend some time to really think about it, but again, you solved the hardest part in my mind, while chat is neat and a great way to test (and i could see hooking up to SMS or slack for chatting with your home while away), voice is really the killer application for this. |
I have a similar vision but I think not exactly the same. What I think is that most of those capabilities should be encapsulated in the core (history, tools, memory....) and the HLI interface keeps been a black box that allows to connect to whatever implementation you want (OpenAI, Ollama, Gemini... or the current built-in interpreter). I haven't got much experience with this but I was looking around and that seems to be a good way to go and I think is not hard going from the current state to that.
Right now, once you enable it, a click on any place of the page loads the Webworker and connects to the Websocket. I think I will add those two things as options and move the PR to ready, thank you for the feedback. |
|
Can't wait to try this out, but unfortunately I am quite busy at the moment ... Just wanted to share a few issues/PRs with you wrt HLI: |
Thank you for the links @florian-h05 I have taken a quick look and I don't think the best choice is changing the interpreters response. I think that if you want to allow the LLM model to have the ability to display cards on the chat in the UI you should expose that as a tool to the model instead of it been the model response. For example if you want to implement the capability of generating images using the chat, you will make a new tool available to the model and when it calls that tool the image will be send to the chat, it doesn't need to be integrated into the interpreter response, and also changing the interface will made then unusable by audio. |
Signed-off-by: Miguel Álvarez <[email protected]>
Signed-off-by: Miguel Álvarez <[email protected]>
8fdee10 to
56567a8
Compare
My proposal would be to extend the response with tool calls, so a model can return both text and tool calls. That’s what you suggest as well. |
I think that by adding a "Conversation" that the interpreter can interact with in real time without waiting for the interpretation to end makes more sense, there you can have a conversation between different roles, the basic ones like the user and openhab but also the thinking and the tool calling, and the chat can get the changes of the conversation in real time and make the user aware of the tool usage and thinking, and also implement the cool text streaming that is common on the AI chats. I think it solves most of the problems without requiring any breaking change. I'll try to write a draft PR to see if it makes sense or I'm missing something. |
Yes that sounds like a solid plain, i like the idea of breaking out the functionality into separate features or even bundles
Well, thats always strange when you start reading a thread and stumble upon your own posts which you totally don't remember writing ;-). ....... so thanks for the reminder. Like groundhog's days.
That would be great, I would love to see a high level design of how this would all fit together end-to-end, i think that would really speed things up.... right now its a bit ambiguous to me ( hoping to dive into the PRs mentioned as well as the HLI code this weekend as i am totally not up to speed) |


This is a WIP PR for the issue #2275
PR work with latest snapshot Nov 30 2025.
How it works
Under
bundles/org.openhab.ui/web/src/js/voiceI've added a library that handles the usage of the WebWorker and the Audio APIs to connect by websocket to openHAB and stream the audio.These are the main classes inside:
Requirements
It requires to access the WebUI over https (localhost domain is excluded).
To make it work over http in other domains with chrome you can go to chrome://flags/#unsafely-treat-insecure-origin-as-secure and add the openhab url there.
These is because the browser requires a secure connection to allow the access to media devices (mic, webcam...).
The options in the about page are hidden if access to media devices is not possible.
It's required to configure in the openHAB server the default speech-to-text, text-to-speech and interpreter services in the voice settings.
Current state:
Options added in the about section (they will not be shown if the getUserMedia API is not available in the browser):
If you enable the voice dialog, a button will be shown at the top right of the overview page which shows a different icon indicating the state of the dialog: