-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Description
Feature Request: Automatic Hugging Face safetensors
-> .gguf
Converter for Whisper Models
Currently, converting Whisper models from Hugging Face (especially fine-tuned ones in safetensors
format) to gguf
format for use with whisper.cpp
is a multi-step, error-prone process requiring manual script patching, header editing, or loss of metadata (e.g. "dims"
).
I'm proposing a first-party utility script (or an enhancement to convert-pt-to-gguf.py
) that:
- Accepts Hugging Face models (
.safetensors
,.bin
,.pt
) - Uses
transformers.WhisperForConditionalGeneration
to parse model - Outputs a
.gguf
file compatible withwhisper.cpp
- Automatically extracts config (e.g.
n_ctx
,n_audio_ctx
, vocab) from the model/tokenizer
This would simplify adoption for many users and reduce GitHub issues related to conversion errors like:
KeyError: 'dims'
Technical Proposal
A script like:
from transformers import WhisperForConditionalGeneration, WhisperTokenizer
from transformers import AutoTokenizer, AutoConfig, AutoModelForSeq2SeqLM
import torch
model_id = "openai/whisper-large-v3"
model = WhisperForConditionalGeneration.from_pretrained(model_id)
tokenizer = WhisperTokenizer.from_pretrained(model_id)
# Optional: convert model state_dict → ggml or gguf format
# Save vocab, audio hyperparameters, etc.
# Feed into whisper.cpp conversion chain
Coupled with gguf header builder (already present), the final .gguf
model can be automatically produced.
Motivation
- Most Whisper checkpoints are now shared in
safetensors
format - Current conversion scripts (
convert-pt-to-ggml.py
) expect.pt
with"dims"
key - Users face
KeyError
, header errors, or broken audio configs - Hugging Face models contain all required metadata — just not exposed in current scripts
Reference issue: #3315
Metadata
Metadata
Assignees
Labels
No labels