From Detection to Narration and Explanation #13817

arcyleung · 2024-09-18T22:13:59Z

arcyleung
Sep 18, 2024

Hello friends, Frigate has been working amazing for my needs, and recently I built a small integration on top of it. Essentially it combines detection capability of TensorRT and another vision-text model like LLaVA to narrate and explain events as they are being recorded, so there's a text-searchable transcript.

I am gauging if there's community interest for this type of integration for multi-modal workflows (image+video) support, much like how TensorRT YOLO is currently integrated. I'm willing to put in effort to polish it further and contribute it upstream here.

You can find a video demo here and code on my repo, thanks!

genevera · 2024-10-01T13:48:47Z

genevera
Oct 1, 2024

I'm super interested in this.

2 replies

arcyleung Oct 2, 2024
Author

Thanks for your interest! I'll work on improving the setup documentation once I get back from travels in the next few days.

genevera Oct 5, 2024

I put in a PR that implements some missing things and fixes some broken ones 🙃

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

From Detection to Narration and Explanation #13817

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

From Detection to Narration and Explanation #13817

Uh oh!

arcyleung Sep 18, 2024

Replies: 1 comment · 2 replies

Uh oh!

genevera Oct 1, 2024

Uh oh!

arcyleung Oct 2, 2024 Author

Uh oh!

genevera Oct 5, 2024

arcyleung
Sep 18, 2024

Replies: 1 comment 2 replies

genevera
Oct 1, 2024

arcyleung Oct 2, 2024
Author