LLM user frame processor with tests #703

aconchillo · 2024-11-09T22:42:42Z

This PR is based on #450 and improves VAD and user transcription ordering issues fixing the LLM context aggregators.

skandere · 2024-11-17T21:19:28Z

Hey @aconchillo how is it going? any updates on when you think this PR will be merged?

skandere · 2024-12-11T03:10:25Z

Hello team! @aconchillo @marcus-daily , any ETA on when you think this PR could be merged? I've been using a forked repo with this fix, works super well. Would love to go back to the main repo!

aconchillo · 2024-12-15T18:29:14Z

There are a couple of use cases (the ones were we interrupt) that I'm not sure about and I have never ran into actually. One of the cases is:

S E T1 T2

In this case we get 2 transcriptions after UserStoppedSpeaking. In this PR implementation (which is still the original one from @EdgarMMProsper) when we get T2 we interrupt the bot. It's actually weird, because if T1 was a final transcription we shouldn't really get another final transcription without having another UserStartSpeaking-UserStoppedSpeaking, but I guess it can happen? Interrupting the bot from the context aggregator is something I'd like to avoid since I don't think it's the right place . In general we don't want to send random interruptions from processors to fix some other issue (i.e. separation of concerns for each processor as much as possible).

EdgarMMProsper · 2024-12-16T08:05:21Z

There are a couple of use cases (the ones were we interrupt) that I'm not sure about and I have never ran into actually. One of the cases is:
S E T1 T2
In this case we get 2 transcriptions after UserStoppedSpeaking. In this PR implementation (which is still the original one from @EdgarMMProsper) when we get T2 we interrupt the bot. It's actually weird, because if T1 was a final transcription we shouldn't really get another final transcription without having another UserStartSpeaking-UserStoppedSpeaking, but I guess it can happen? Interrupting the bot from the context aggregator is something I'd like to avoid since I don't think it's the right place . In general we don't want to send random interruptions from processors to fix some other issue (i.e. separation of concerns for each processor as much as possible).

In our experience this does happen quite a lot. If you do not want to send interruptions from this processors (I agree it is not very clean), you can implement a similar logic in the input processor directly.

aconchillo · 2024-12-16T15:41:38Z

There are a couple of use cases (the ones were we interrupt) that I'm not sure about and I have never ran into actually. One of the cases is:
S E T1 T2
In this case we get 2 transcriptions after UserStoppedSpeaking. In this PR implementation (which is still the original one from @EdgarMMProsper) when we get T2 we interrupt the bot. It's actually weird, because if T1 was a final transcription we shouldn't really get another final transcription without having another UserStartSpeaking-UserStoppedSpeaking, but I guess it can happen? Interrupting the bot from the context aggregator is something I'd like to avoid since I don't think it's the right place . In general we don't want to send random interruptions from processors to fix some other issue (i.e. separation of concerns for each processor as much as possible).
In our experience this does happen quite a lot. If you do not want to send interruptions from this processors (I agree it is not very clean), you can implement a similar logic in the input processor directly.

Interesting. Well, I guess we can give this a try 😃 . I have some local changes (e.g. pushing BotInterruptionFrame upstream instead) and some unit tests fixes because many internal things have changed since this PR was originally written. I should be able to finish it today/tomorrow.

aconchillo · 2024-12-19T23:35:37Z

@EdgarMMProsper @skandere In the case discussed above:

S E T1 T2

what is an example of the contents of T1 and T2. Why do we need to interrupt instead of simply queuing T2?

aconchillo · 2024-12-20T02:54:33Z

@EdgarMMProsper @skandere In the case discussed above:
S E T1 T2
what is an example of the contents of T1 and T2. Why do we need to interrupt instead of simply queuing T2?

I was actually reading the original commit and sent_aggregation_after_last_interruption is actually never sent to True which means interruptions will never happen. So, I guess my question above is still valid.

EdgarMMProsper · 2024-12-20T08:33:53Z

@EdgarMMProsper @skandere In the case discussed above:
S E T1 T2
what is an example of the contents of T1 and T2. Why do we need to interrupt instead of simply queuing T2?

T1: start of the sentence. T2: end of the sentence. Both things said between S and E but received a bit late because of STT latency. If you do not interrupt the bot will generate two very similar outputs "repeating" itself.

aconchillo · 2024-12-20T14:23:31Z

@EdgarMMProsper @skandere In the case discussed above:
S E T1 T2
what is an example of the contents of T1 and T2. Why do we need to interrupt instead of simply queuing T2?
T1: start of the sentence. T2: end of the sentence. Both things said between S and E but received a bit late because of STT latency. If you do not interrupt the bot will generate two very similar outputs "repeating" itself.

It seems the foundational examples series 22 would solve this issue without needing to interrupt. I was assuming T1 and T2 would be two different sentences in the which case the interruptions didn’t make sense. But if it’s the same sentence then those examples might help better without needing to interrupt. What if T1 has more context than T2?

aconchillo · 2024-12-22T17:04:13Z

@EdgarMMProsper @skandere In the case discussed above:
S E T1 T2
what is an example of the contents of T1 and T2. Why do we need to interrupt instead of simply queuing T2?
T1: start of the sentence. T2: end of the sentence. Both things said between S and E but received a bit late because of STT latency. If you do not interrupt the bot will generate two very similar outputs "repeating" itself.
It seems the foundational examples series 22 would solve this issue without needing to interrupt. I was assuming T1 and T2 would be two different sentences in the which case the interruptions didn’t make sense. But if it’s the same sentence then those examples might help better without needing to interrupt. What if T1 has more context than T2?

I've decided I will make this a constructor argument. So you will have the option to interrupt the bot if you want to. But I have the feeling the series 22 examples (or any other approach) would work better.

This seems necessary because of how pytest works. If a task is cancelled, pytest will know the task has been cancelled even if # `asyncio.CancelledError` is handled internally in the task.

kwindla

lgtm

aconchillo · 2024-12-24T03:12:03Z

Alright, I have been playing with this a bit. I wanted to add it to 0.0.52 but it's not going to happen. We will rethink LLM aggregators in January considering also how multi-modal models works. For now, I'll park this.

aconchillo mentioned this pull request Nov 9, 2024

LLM user frame processor with tests #450

Closed

LLM user frame processor with tests

c989c9c

aconchillo force-pushed the aleix/edgar/llm-user-frame-processor branch 3 times, most recently from bf78cee to 569d10a Compare December 24, 2024 02:04

aconchillo added 4 commits December 23, 2024 18:15

processors(llm_response): unify new use cases into base class

2dd56ba

processors(frame_processors): add a try/except when cancelling tasks

ba6e9ed

This seems necessary because of how pytest works. If a task is cancelled, pytest will know the task has been cancelled even if # `asyncio.CancelledError` is handled internally in the task.

improved unit tests / add a run_test function to test processors

dfdd536

update README with KoalaFilter

eef3f32

aconchillo force-pushed the aleix/edgar/llm-user-frame-processor branch from 569d10a to eef3f32 Compare December 24, 2024 02:15

kwindla approved these changes Dec 24, 2024

View reviewed changes

aconchillo marked this pull request as draft December 24, 2024 03:10

chadbailey59 mentioned this pull request Jan 9, 2025

LLMResponseAggregator behaviour when Start, End, Partial and Transcription do not follow an ideal flow #441

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM user frame processor with tests #703

LLM user frame processor with tests #703

aconchillo commented Nov 9, 2024 •

edited

Loading

skandere commented Nov 17, 2024

skandere commented Dec 11, 2024

aconchillo commented Dec 15, 2024

EdgarMMProsper commented Dec 16, 2024

aconchillo commented Dec 16, 2024

aconchillo commented Dec 19, 2024

aconchillo commented Dec 20, 2024

EdgarMMProsper commented Dec 20, 2024

aconchillo commented Dec 20, 2024

aconchillo commented Dec 22, 2024 •

edited

Loading

kwindla left a comment

aconchillo commented Dec 24, 2024

LLM user frame processor with tests #703

Are you sure you want to change the base?

LLM user frame processor with tests #703

Conversation

aconchillo commented Nov 9, 2024 • edited Loading

skandere commented Nov 17, 2024

skandere commented Dec 11, 2024

aconchillo commented Dec 15, 2024

EdgarMMProsper commented Dec 16, 2024

aconchillo commented Dec 16, 2024

aconchillo commented Dec 19, 2024

aconchillo commented Dec 20, 2024

EdgarMMProsper commented Dec 20, 2024

aconchillo commented Dec 20, 2024

aconchillo commented Dec 22, 2024 • edited Loading

kwindla left a comment

Choose a reason for hiding this comment

aconchillo commented Dec 24, 2024

aconchillo commented Nov 9, 2024 •

edited

Loading

aconchillo commented Dec 22, 2024 •

edited

Loading