Is text-gen-webui able to load the new meta-llama_Llama-3.2-11B-Vision? & Cannot load multimodal ext #6412

CallMeRive · 2024-09-28T06:02:37Z

CallMeRive
Sep 28, 2024

I have the model, downloaded it manually, but I cannot load it, cause Transformers doesn't recognize its architecture called "mllama". I understand that 'm' in "mllama" means multimodal, so probably I'd need the multimodal extension, but the multimodal extension doesn't want to load either, with these errors:

08:53:32-869239` INFO     Loading the extension "multimodal"
		Traceback (most recent call last)
 N:\text-generation-webui-main\server.py:279 in <module>                                                   
                                                                                                           
   278                 time.sleep(0.5)                                                                     
 > 279                 create_interface()                                                                  
   280                                                                                                     
                                                                                                           
 N:\text-generation-webui-main\server.py:166 in create_interface                                           
                                                                                                           
   165         extensions_module.create_extensions_tabs()  # Extensions tabs                               
 > 166         extensions_module.create_extensions_block()  # Extensions block                             
   167                                                                                                     
                                                                                                           
 N:\text-generation-webui-main\modules\extensions.py:199 in create_extensions_block                        
                                                                                                           
   198                 extension, _ = row                                                                  
 > 199                 extension.ui()                                                                      
   200                                                                                                     
                                                                                                           
 N:\text-generation-webui-main\extensions\multimodal\script.py:99 in ui                                    
                                                                                                           
    98     global multimodal_embedder                                                                      
 >  99     multimodal_embedder = MultimodalEmbedder(params)                                                
   100     with gr.Column():                                                                               
                                                                                                           
 N:\text-generation-webui-main\extensions\multimodal\multimodal_embedder.py:27 in __init__                 
                                                                                                           
    26     def __init__(self, params: dict):                                                               
 >  27         pipeline, source = load_pipeline(params)                                                    
    28         self.pipeline = pipeline                                                                    
                                                                                                           
 N:\text-generation-webui-main\extensions\multimodal\pipeline_loader.py:34 in load_pipeline                
                                                                                                           
   33     else:                                                                                            
 > 34         model_name = shared.args.model.lower()                                                       
   35         for k in pipeline_modules:                                                                   

AttributeError: 'NoneType' object has no attribute 'lower'

CallMeRive · 2024-10-04T06:56:38Z

CallMeRive
Oct 4, 2024
Author

Hey? Anybody reads this?

2 replies

ChrisWhiteSr Oct 5, 2024

I got it to work but man is it slow with my machine! 3060 12GB

good luck!

Run cmd_windows.bat in the text-generation-webui folder.
Set the HF_TOKEN like so: set HF_TOKEN=<YOUR_HF_TOKEN_HERE>
Lastly, run download-model.py with the model argument.
Example: python download-model.py meta-llama/Llama-3.2-11B-Vision

CallMeRive Oct 5, 2024
Author

Your answer is about downloading the model, which is not a problem - I have the model.
My question was about loading it. Without the multimodal extension I get the error "The checkpoint you are trying to load has model type mllama but Transformers does not recognize this architecture." etc, but when I try to activate the multimodal extension, the error from the first post occurs.

I've updated the post to clarify this confusion.

CalculonPrime · 2024-10-06T20:38:46Z

CalculonPrime
Oct 6, 2024

Even if you loaded it, wouldn't oobabooga need to also add support for importing images for it to do anything? As I understand it Llama 3.2 "vision" models are about "image to text". Basically the opposite of stable diffusion. So you'd drag a photo into the (hypothetical) Web UI in the future, and then you could ask the text engine questions about it.

1 reply

CallMeRive Oct 7, 2024
Author

That's why, I believe, I need the multimodal extension.
https://github.com/oobabooga/text-generation-webui/blob/main/extensions/multimodal/README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is text-gen-webui able to load the new meta-llama_Llama-3.2-11B-Vision? & Cannot load multimodal ext #6412

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Is text-gen-webui able to load the new meta-llama_Llama-3.2-11B-Vision? & Cannot load multimodal ext #6412

CallMeRive Sep 28, 2024

Replies: 2 comments · 3 replies

CallMeRive Oct 4, 2024 Author

ChrisWhiteSr Oct 5, 2024

CallMeRive Oct 5, 2024 Author

CalculonPrime Oct 6, 2024

CallMeRive Oct 7, 2024 Author

CallMeRive
Sep 28, 2024

Replies: 2 comments 3 replies

CallMeRive
Oct 4, 2024
Author

CallMeRive Oct 5, 2024
Author

CalculonPrime
Oct 6, 2024

CallMeRive Oct 7, 2024
Author