Tutorial: How to convert HuggingFace model to GGUF format #2948
Replies: 44 comments 59 replies
-
You might want to add a small note that requantizing to other formats from |
Beta Was this translation helpful? Give feedback.
-
I have a model trained using Qlora and I can only convert it to min. 8-bit quantization using GGUF. What about q4_K_S quantization why are they not available? |
Beta Was this translation helpful? Give feedback.
-
Can anyone help me debug this? |
Beta Was this translation helpful? Give feedback.
-
Is there a way to directly do this on colab? |
Beta Was this translation helpful? Give feedback.
-
This way i can only get one file such ass gguf. Is it available to convert model in reproducable format like TheBloke in huggingface? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Hi @samos123 I'm only used to working with .gguf kind of files for LLM, I have no idea what to do with this kind of models and so did a search and found your post. Am I right to assume all models structured this way are hf models? Is there any where I can read more about this? It seems all Youtube go straight to the quantized version .gguf. Are hf models considered the raw models that can be further tuned into something else? I have lots of assumptions but hard to verify. |
Beta Was this translation helpful? Give feedback.
-
Please tell me the difference between the roles of the following files.
My predictions are as follows.
Why aren't Also, only |
Beta Was this translation helpful? Give feedback.
-
Improved the download.py script:
This way you can just pass the model name on huggingface in the command line. It will remove the slash and replace it with a dash when creating the directory. Example:
|
Beta Was this translation helpful? Give feedback.
-
I'm having a |
Beta Was this translation helpful? Give feedback.
-
Hi, I ran into an odd error and was really struggling to find any relevant information online. Hoping someone here can help. I know almost nothing about the technical side of things, just an average AI text gen user. I'm trying to convert GGUFs for models and checked out instructions both here and this guide on Reddit: I managed to get convert.py working, can do FP16 and Q8 converts without issue, but ran into the same mysterious error repeatedly when trying to use quantize.exe to convert pretty much anything. I've tried with both this model Mixtral Erotic and this model CatPPT The error message is always the same:
The processing always gets stuck on "line: 1 char:19", I'm not sure why and I can't really see what character it is specifically. BtW, I'm running in Powershell, just right clicked on the quantize.exe under Explorer and chose the option to auto navigate to that location. I'm not sure if that makes a difference. I'm wondering if the error is because I don't have Llama.cpp installed correctly. Running quantize.exe through CMD gives an error about cudart64_12.dll missing, but downloading and putting the cudart files into the same folder doesn't stop the error. If I'm only using convert.py and quantize .exe, do I still need to follow the Cmake instructions on the Llama.cpp main page to "build Llama" from the source code? I've already ran the requirements.txt through pythonnkich is why convert.py is working for me, I think. It's just for some reason quantize.exe doesn't work. Edit (Update): |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
As the errors state, you are mixing multiple models Please properly download files from HF microsoft/phi-2. Note: you can directly download GGUF quantized Microsoft Phi-2 models from HF with hf.sh, example for a Q4_K_M: ./scripts/hf.sh --repo TheBloke/phi-2-GGUF --file phi-2.Q4_K_M.gguf |
Beta Was this translation helpful? Give feedback.
-
This might be useful. If anyone wants to help improving it, it's always welcome. https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script |
Beta Was this translation helpful? Give feedback.
-
New to this, am trying to convert an embedding model (https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) to gguf format. When I tried using It seems to stem from here: llama.cpp/convert-hf-to-gguf.py Line 3118 in 1c5eba6 Any idea what's the issue and fix here? |
Beta Was this translation helpful? Give feedback.
-
Hi, not only me, but someone else had t his problem... is it me, or does it seems like BERT models are not supported, so they can't be converted to GGUF format? |
Beta Was this translation helpful? Give feedback.
-
I have this error message. how do I fix it?
|
Beta Was this translation helpful? Give feedback.
-
I have deleted the code but I remember that I failed with that method. At
the end I just at HuggingFace pages for online converter gguf
Pada Sel, 6 Agu 2024 13.43, gavin-edward ***@***.***> menulis:
… Thank you very much for your help. After building I ran quantize with:
quantize models/susnato_phi-1_5.gguf models/susnato_phi-1_5_q8_0.gguf Q8_0
And it works nicely. Cheers!
hello, could you please share your building method with me ?
—
Reply to this email directly, view it on GitHub
<#2948 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATWVRZLPFXE3MG23L3QKLMLZQBWC7AVCNFSM6AAAAAA4G4QWYKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMRVGAYTSOA>
.
You are receiving this because you commented.Message ID: <ggerganov/llama.
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I can't convert any models that classify tokens. What's wrong? |
Beta Was this translation helpful? Give feedback.
-
Doesn't work. The convert.py isn't there, other versions of that script throw different kind of errors. Checking out older branches brings back convert.py, but it throws all sorts of errors (like Is there a working instruction? |
Beta Was this translation helpful? Give feedback.
-
If you go on hugging face, the boys set up a simple UI that you can use.Cody Krecicki, FounderChoice Internet Brands, Inc & ***@***.*** Oct 17, 2024, at 10:31 AM, Mykola Makhin ***@***.***> wrote:
Doesn't work. The convert.py isn't there, other versions of that script throw different kind of errors. Checking out older branches brings back convert.py, but it throws all sorts of errors (like AttributeError: module 'gguf' has no attribute 'MODEL_TENSOR_NAMES'. Did you mean: 'MODEL_TENSORS'? or KeyError: 'tok_embeddings.weight').
Is there a working instruction?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
How’d you train the model? What did you use? What was the base model you used with the Lora? Is it just a random model you found and want a gguf?Cody Krecicki, FounderChoice Internet Brands, Inc & ***@***.*** Oct 17, 2024, at 11:00 AM, Mykola Makhin ***@***.***> wrote:
I know, but I want to run it locally with Jan, and for that I need to convert it to GGUF.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
cant convert VIT image model? python llama.cpp/convert_hf_to_gguf.py "C:\Users\user\models\vit-model-hf" INFO:hf-to-gguf:Loading model: vit-model-hf |
Beta Was this translation helpful? Give feedback.
-
What you guys have to understand is there are certain templates that work only https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_templateCody Krecicki, FounderChoice Internet Brands, Inc & ***@***.*** Nov 5, 2024, at 7:10 PM, Joseph Encila ***@***.***> wrote:
cant convert VIT image model?
python llama.cpp/convert_hf_to_gguf.py "C:\Users\user\models\vit-model-hf"
--outfile vit-model-hf.gguf --outfile f16 or f32 or q8_0
INFO:hf-to-gguf:Loading model: vit-model-hf
ERROR:hf-to-gguf:Model ViTForImageClassification is not supported
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
When you say custom LLM, are you talking like you trained it from scratch without any base models or did you make a Lora? Did you follow any architecture or you just straight up made everything from scratch?Cody Krecicki, FounderChoice Internet Brands, Inc & ***@***.*** Nov 15, 2024, at 2:51 AM, Aravinda Kumar ***@***.***> wrote:
So, what should I do if i want to convert my custom LLM to gguf format. I can run the model with huggingface transformers.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
There should be a file somewhere in this repo where it has templates for the different chat templates for the different architectures. You might have to add a new one.Cody Krecicki, FounderChoice Internet Brands, Inc & ***@***.*** Nov 15, 2024, at 8:13 AM, Aravinda Kumar ***@***.***> wrote:
It is based on the gpt3 architecture. But it was pretrained from scratch.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Here is somewhere to start, https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_templateIt may not be what you’re looking for but your problem sounds similar to others. If this isn’t the case. Then try the gguf maker straight up on huggingface. If it still won’t work, I’m not sure my friend.Cody Krecicki, FounderChoice Internet Brands, Inc & ***@***.*** Nov 15, 2024, at 8:56 AM, Aravinda Kumar ***@***.***> wrote:
Thank you for the response. Can you point me to the file?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi! I'm experiencing this problem :
How to make it work? |
Beta Was this translation helpful? Give feedback.
-
If LLaMA reports an error about not supporting This tool finally let me convert a HuggingFace multimodal (LLava) float-16 safetensor model into a 4-bit GGUF model and an mmproj projector file. Why is this so poorly documented? Leaving it here in case it helps someone else! |
Beta Was this translation helpful? Give feedback.
-
Source: https://www.substratus.ai/blog/converting-hf-model-gguf-model/
I published this on our blog but though others here might benefit as well, so sharing the raw blog here on Github too. Hope it's helpful to folks here and feedback is welcome.
Downloading a HuggingFace model
There are various ways to download models, but in my experience the
huggingface_hub
library has been the most reliable. The
git clone
method occasionally results inOOM errors for large models.
Install the
huggingface_hub
library:Create a Python script named
download.py
with the following content:Run the Python script:
You should now have the model downloaded to a directory called
vicuna-hf
. Verify by running:Converting the model
Now it's time to convert the downloaded HuggingFace model to a GGUF model.
Llama.cpp comes with a converter script to do this.
Get the script by cloning the llama.cpp repo:
Install the required python libraries:
Verify the script is there and understand the various options:
Convert the HF model to GGUF model:
In this case we're also quantizing the model to 8 bit by setting
--outtype q8_0
. Quantizing helps improve inference speed, but it cannegatively impact quality.
You can use
--outtype f16
(16 bit) or--outtype f32
(32 bit) to preserve originalquality.
Verify the GGUF model was created:
Pushing the GGUF model to HuggingFace
You can optionally push back the GGUF model to HuggingFace.
Create a Python script with the filename
upload.py
thathas the following content:
Get a HuggingFace Token that has write permission from here:
https://huggingface.co/settings/tokens
Set your HuggingFace token:
Run the
upload.py
script:Beta Was this translation helpful? Give feedback.
All reactions