Replies: 1 comment
-
I am keeping only the ggml f16 models around, that's because there is still a lot of work and research being done for quantization. From f16 it is easy to generate new quantized files. Also note that if you clone a HF repo with git-lfs it will duplicate the data. Otherwise, storage is very cheap nowadays. I'm actually uploading some files to a CDN so I can access the files over FTP and HTTP. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I love this project - my one issue is - HD space and so many conversions.
eg the workflow from start to finish with a LLaMa derived model when done "properly" is painful with the StabilityLM or OpenAssistant XORs and more model variants appearing.
a) download model weights in the original format
b) convert to hugging face / transformer format
c) apply xor/delta
d) convert to ggml
And I did not even mention quantizing.
If HF file format(s) are missing important things, could what's missing be contributed there?
Or can we hope that everybody will switch to GGML?
Just this weekend I filled up 900GB HD space just with playing around with new model versions and file formats.
Any creative ideas other than deleting and redownloading/regenerating?
Beta Was this translation helpful? Give feedback.
All reactions