Support force activated pipelines #144

chatziko · 2024-03-27T09:02:23Z

This PR contains the client-side of a "push-to-talk"-like feature that allows the server to "force activate" the satellite, going directly to ASR. The change is simple: when the server sends run-satellite with start_stage = asr then we send back a run-satellite (as usual) with that stage and start streaming immediately. More details will be given in the corresponding PR in home-assistant/core.

#143 is recommended for this to work properly, otherwise the awake sound and debug recording will not be triggered for "force activated" pipelines. I made separate PRs for easier review, if you merge #143 first I can take care of the conflicts.

Note also that this wyoming change is needed by this PR.

Finally note that this is about a server-side "push-to-talk", not client side as in #82.

founderio · 2024-03-30T10:54:05Z

I feel like a remote-activated function like this needs a config option to enable/disable it, ideally defaulting to OFF. This feels like a privacy issue waiting to happen.

chatziko · 2024-03-30T11:13:55Z

It could be easily made optional. But I'm not sure what scenario you have in mind, who is the adversary?
If your HA server is compromised you should have no privacy expectation at all, the server controls the satellites and pretty much everything else.

founderio · 2024-03-30T18:29:38Z

In general I'm leaning towards the "prank" level of privacy, so someone with access to the network.

However, from my understanding so far, there is no authentication/authorization in the wyoming protocol? (Please correct me if I'm wrong, but I did not find any reference that shows any kind of authentication happening)

This means, by extension, that the satellite could be controlled by someone who is NOT the HA server. Which means anyone in the network could trigger audio streaming - bypassing voice/hotword activation - to basically anywhere in the network.

chatziko · 2024-03-30T20:04:55Z

This is an interesting security discussion in general, maybe encryption/authentication could be added to wyoming (similarly to esphome).

But it's completely orthogonal to this PR. Even without this PR a malicious user on the local network can connect to the satellite and stream audio. If wake-word detection happens locally on the satellite the malicious user would simply need to wait until the first detection, and then he could stream audio forever (pretending the pipeline never ends).

Or, to avoid waiting, he could even do a more exotic attack like send an audio-chunk to the satellite containing playback audio of the wake word (a wav file saying "ok nabu"), which the satellite will happily playback (thinking it's a TTS response) and wake itself up 😄

founderio · 2024-03-30T20:39:24Z

Thank you for this insight! I was not aware the protocol allowed that much freedom :D

I've filed an issue in the protocol repo here, as this looks like a more foundational topic: OHF-Voice/wyoming#11

llluis · 2024-04-03T15:47:27Z

Hey @chatziko, very interesting approach. Thanks for the contributiuon.
We did the same feature, but differently. :)

Don't know if you saw this one, it's a very long thread, but you can go bottom-up: #81

I did that and a few other "exploration" changes, however, PRs are not going through.
I don't want to create a parallel repo, so I'm waiting on those to be merged or rejected to continue adding new features. :(

chatziko · 2024-04-04T12:39:06Z

Hey @llluis , wow, that's a long discussion in #81, I certainly missed it otherwise I would have participated. I hadn't followed wyoming-satellite for a while, I only posted a couple of issues back when it was first released then kept using the old homeassistant-satellite until a couple of weeks ago. Then I checked the PRs for recent developments, but not the issues (apart from #4 in which I saw no action). But I'm very happy to find out that people are pushing interesting ideas forward!

I had a quick look at the code in llluis/wyoming-satellite, if I understand correctly the general design for the "remote trigger" is that the server sends a "fake" detect event. It's certainly a simple and natural design. The reason I didn't follow this approach is that I thought it has a few drawbacks:

Just handling detect in the usual way doesn't work with VadStreamingSatellite, cause streaming is triggered by VAD, not detect. So it needs a different implementation there, complicating the code.
Conceptually, "faking" an event is a bit messy. detect has specific semantics and can happen at specific moments in the pipeline. Adding new semantics and allowing it to happen at any moment will make the code more complicated. (And then adding the question_id param which has nothing to do with a real detection makes things even more "hacky".)
Sending detect will not work if the satellite is paused, cause the server is not connected (in my use-case I want to pause the satellite, eg when the TV is on, and then be able to trigger it while paused). So you'd need to unpause first and coordinate the detect afterwards, which again would complicate the code.

So in the end I found the approach of adding start_stage to run-satellite cleaner, but of course it's not a huge difference and in the end any approach that works is fine. Hopefully @synesthesiam will find some time soon to have a look and let us know what he thinks and whether he plans to merge just a feature.

PS. Concerning question_id, if I understood correctly your goal was to only do ASR? This sounds like an interesting goal (although I find the question_id solution messy). In my design one could easily achieve this by also adding end_stage to run-satellite (fits well with start_stage).

Mincka · 2024-05-15T21:12:23Z

Very nice PRs. I've tested them, including the button in HA.
Everything works fine, including the awake sound which is kept when the button.press service is called.

Thank you very much. I sincerely hope that this work can be reviewed and merged in the future.

I wrote a small guide there for anyone interested to test and use it in their setup.

chatziko · 2024-05-16T07:29:44Z

Awesome, thanks for testing and writing a guide.

Mincka · 2024-05-18T07:08:25Z

Not sure at that point if that's related to one of the PR, but the button entity was named button.my_satellite_none, and now I see that after removing the integration and adding the satellite again that none of the entities has a name related to their setting.

select.my_satellite_none
select.my_satellite_none_2
switch.my_satellite_none
number.my_satellite_none
number.my_satellite_none_2

Mincka · 2024-05-28T08:18:05Z

Do you know if this activation can be associated with one of the current events like --detection-command?
I tried to use it but it does not seem to be called when using the button press.
My goal is to keep a chime or send a tts to let the user know that the satellite is ready to listen.

chatziko · 2024-05-28T08:29:56Z

It should work, but of course with the events that are actually happening. No wakeword was detected so it makes sense that --detection--command does not fire. But --stt-start-command should work (haven't tested it though).

Mincka · 2024-05-28T08:43:49Z

I think I've tested with all the events around the detection without success, but I'll try again with that one and report here for the results. It could also an issue in the original project so I'll test with the standard workflow with the wake word. Thank you for the quick answer.

Mincka · 2024-06-03T18:50:24Z

Quite sad that OHF-Voice/wyoming#10 did not make it to 1.5.4 since it was ready for a long time.
However, it's good to see a little bit of progress around this. Now the requirements for this PR are fulfilled. :)

qJake · 2024-07-06T00:37:58Z

Can't wait for this! I'm really hoping that after this is implemented, we can tweak the LLM call to include a "Follow up?" parameter which can just trigger an immediate detection event on the satellite so it can start listening right away.

Basically, this gets us one step closer to true two-way, multi-step communication with LLMs. Very exciting.

slyticoon · 2024-11-17T20:28:58Z

Excited for this to be implemented. This would allow me to completely replace my commercial voice assistants. Actionable notification and complex conversations will be great to have.

jjdenhertog · 2024-12-25T21:08:15Z

@chatziko I noticed that changes to the core are made referencing this pr. But it's not fully integrated. Is the current code still usable? Would love to see this working.

qJake · 2025-02-07T23:48:05Z

Home Assistant's native assist_satellite entity type recently got support for announcing a prompt and listening for a response, initiated from Home Assistant. I think that's a comparable replacement to this...

jjdenhertog · 2025-02-08T13:03:17Z

@qJake Ar those features not used to initiate a conversation only? This PR could help for keeping a conversation running. An example of this would be to activate the satellite when the response from the voice assistant contains a question. Than we wouldn't want to re-announce the question to start the conversation.

qJake · 2025-02-10T14:40:24Z

@jjdenhertog Yes, that is true - and I think the issue there is, how do you get an LLM to respond with metadata that flags "I am asking the user a question?" reliably? There isn't a way to do that with LLM APIs currently as far as I know.

My thoughts for this PR (as well as the new upcoming assist_satellite action) are more for automations - you could make a TTS announcement, ask a question, then listen for a response. Even if the action requires a prompt, you could probably just put some inaudible text into the prompt that the TTS engine ignores or produces little to no output for.

jjdenhertog · 2025-02-10T15:16:09Z

@qJake The most basic implementation could be some automation to let an LLM determine if it's a question. And if so activate it. For example when the LLM is not able to determine the context. It now sometimes responds with "Which lights should I turn on?". It would be nice if that would actually work. I think that LLM should be able to determine that.

I could also see fail safe scenarios. For example when the user says "Can you turn the volume to fifteen" but the LLM thinks it "turn to fifty" 😂. Would be nice if you could be able to insert an "are you sure about that" question when some values are being set - to which you can reply with "yes" or "no".

Support force activated pipelines

86e766d

This was referenced Mar 27, 2024

Add force activate button to wyoming satellite home-assistant/core#114291

Closed

Implement push-to-talk functionality #5

Open

founderio mentioned this pull request Mar 30, 2024

Authentication/Authorization/Encryption OHF-Voice/wyoming#11

Open

Mincka mentioned this pull request May 15, 2024

Add a way to initiate an interaction #45

Open

Mincka mentioned this pull request Jun 3, 2024

non-robust handling of HA server disconnection #26

Open

harisris mentioned this pull request Jan 28, 2025

Remote Pipelines Status #271

Open

semyonchetvertnyh mentioned this pull request Jun 24, 2025

Support force activated pipelines semyonchetvertnyh/wyoming-satellite#5

Open

Support force activated pipelines #144

Are you sure you want to change the base?

Support force activated pipelines #144

Uh oh!

Conversation

chatziko commented Mar 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

founderio commented Mar 30, 2024

Uh oh!

chatziko commented Mar 30, 2024

Uh oh!

founderio commented Mar 30, 2024

Uh oh!

chatziko commented Mar 30, 2024

Uh oh!

founderio commented Mar 30, 2024

Uh oh!

llluis commented Apr 3, 2024

Uh oh!

chatziko commented Apr 4, 2024

Uh oh!

Mincka commented May 15, 2024

Uh oh!

chatziko commented May 16, 2024

Uh oh!

Mincka commented May 18, 2024

Uh oh!

Mincka commented May 28, 2024

Uh oh!

chatziko commented May 28, 2024

Uh oh!

Mincka commented May 28, 2024

Uh oh!

Mincka commented Jun 3, 2024

Uh oh!

qJake commented Jul 6, 2024

Uh oh!

slyticoon commented Nov 17, 2024

Uh oh!

jjdenhertog commented Dec 25, 2024

Uh oh!

qJake commented Feb 7, 2025

Uh oh!

jjdenhertog commented Feb 8, 2025

Uh oh!

qJake commented Feb 10, 2025

Uh oh!

jjdenhertog commented Feb 10, 2025

Uh oh!

Uh oh!

chatziko commented Mar 27, 2024 •

edited

Loading