Replies: 6 comments 9 replies
-
Thank you for your great work, this is really cool! I just turned on the discussions tab, and it would be great for handling these valuable discussions! So, it seems that with 4/8-bit quantization, LLaVA can fit at least into a GPU with 24GB memory. I am working on a LLaVA version with the latest Vicuna code base, and I am planning this for the next week. I am not very familiar with 4-/8-bit quantization of LLMs yet. Do you think there are something I should specifically pay attention to? It seems that Vicuna has the support for 8-bit quantization already? Thank you again! |
Beta Was this translation helpful? Give feedback.
-
I just did a 4bit quantization with GPTQ, and loaded the model in textgen-web-ui (currently taking up 10.2GB of VRAM, running on a single GTX 3060). EDIT: I just needed to open the webui in chat mode, now it's working! |
Beta Was this translation helpful? Give feedback.
-
Can confirm, it works in a 12GB GPU with 4-bit quantization. It's taking up around 10.2 GB, I might go OOM if I try to give too large a context, but it runs. |
Beta Was this translation helpful? Give feedback.
-
Hi @Wojtab, thank you for your great contribution. I got a chance to try it out today, and it's running with amazingly low RAM! Two things I noticed:
Thank you! |
Beta Was this translation helpful? Give feedback.
-
@Wojtab Great, thank you so much for your contribution! We have just recently released our 7b checkpoint. I noticed that there is a change in the <im_start> and <im_end> index due to the base checkpoint difference. Do you think it is easy to integrate that into the model? If not, we may hack the checkpoint a bit by adding dummy tokens? |
Beta Was this translation helpful? Give feedback.
-
Hi, I am trying to train/finetune the base 7b model for llava. From the Readmeit says its possible, see [7/19] under Release. But cant seem to find an example. |
Beta Was this translation helpful? Give feedback.
-
There is no discussion tab, so opening it as an issue.
I made it work on a single 3090, in ooba's webui, see this PR for more info: oobabooga/text-generation-webui#1487.
There is even a small possibility that it will run on 12GB GPUs, as 4-bit vicuna 13b fits, question is if it fits with CLIP+projector
Beta Was this translation helpful? Give feedback.
All reactions