Gemma 3 and Wan2.1 #357

DCVirtualCosmos · 2025-03-23T08:03:16Z

DCVirtualCosmos
Mar 23, 2025

With the release of those powerful models, one could start dreaming of a tool to caption videos to train Wan LoRAs. And Gemma 3 seems the perfect tool. It's pretty smart, follow instruction quite well, there are versions of it uncensored already on hugging face, and it can analyze perfectly a sequence of images to describe what is happening in a short video.
So, it would be nice if Taggui could:

Open .mp4 videos like the images, playing them in bucle when selected in the left selector.
Allow to caption them like images
Being able to use a .safetensor or .gguf file directly put into the /models folder for captions.
A little menu to drag a selected number of frames from the video to be analyzed for the LLM.

I will try to do this myself when I got time, but perhaps you are faster!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma 3 and Wan2.1 #357

{{title}}

Replies: 0 comments

Select a reply

Gemma 3 and Wan2.1 #357

DCVirtualCosmos Mar 23, 2025

Replies: 0 comments

DCVirtualCosmos
Mar 23, 2025