From Detection to Narration and Explanation #13817
arcyleung
started this conversation in
Show and tell
Replies: 1 comment 2 replies
-
I'm super interested in this. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello friends, Frigate has been working amazing for my needs, and recently I built a small integration on top of it. Essentially it combines detection capability of TensorRT and another vision-text model like LLaVA to narrate and explain events as they are being recorded, so there's a text-searchable transcript.
I am gauging if there's community interest for this type of integration for multi-modal workflows (image+video) support, much like how TensorRT YOLO is currently integrated. I'm willing to put in effort to polish it further and contribute it upstream here.
You can find a video demo here and code on my repo, thanks!
Beta Was this translation helpful? Give feedback.
All reactions