Is text-gen-webui able to load the new meta-llama_Llama-3.2-11B-Vision? & Cannot load multimodal ext #6412
Unanswered
CallMeRive
asked this question in
Q&A
Replies: 2 comments 3 replies
-
Hey? Anybody reads this? |
Beta Was this translation helpful? Give feedback.
2 replies
-
Even if you loaded it, wouldn't oobabooga need to also add support for importing images for it to do anything? As I understand it Llama 3.2 "vision" models are about "image to text". Basically the opposite of stable diffusion. So you'd drag a photo into the (hypothetical) Web UI in the future, and then you could ask the text engine questions about it. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have the model, downloaded it manually, but I cannot load it, cause Transformers doesn't recognize its architecture called "mllama". I understand that 'm' in "mllama" means multimodal, so probably I'd need the multimodal extension, but the multimodal extension doesn't want to load either, with these errors:
Beta Was this translation helpful? Give feedback.
All reactions