-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Open
Description
Feature Request: Automatic Hugging Face safetensors -> .gguf Converter for Whisper Models
Currently, converting Whisper models from Hugging Face (especially fine-tuned ones in safetensors format) to gguf format for use with whisper.cpp is a multi-step, error-prone process requiring manual script patching, header editing, or loss of metadata (e.g. "dims").
I'm proposing a first-party utility script (or an enhancement to convert-pt-to-gguf.py) that:
- Accepts Hugging Face models (
.safetensors,.bin,.pt) - Uses
transformers.WhisperForConditionalGenerationto parse model - Outputs a
.gguffile compatible withwhisper.cpp - Automatically extracts config (e.g.
n_ctx,n_audio_ctx, vocab) from the model/tokenizer
This would simplify adoption for many users and reduce GitHub issues related to conversion errors like:
KeyError: 'dims'Technical Proposal
A script like:
from transformers import WhisperForConditionalGeneration, WhisperTokenizer
from transformers import AutoTokenizer, AutoConfig, AutoModelForSeq2SeqLM
import torch
model_id = "openai/whisper-large-v3"
model = WhisperForConditionalGeneration.from_pretrained(model_id)
tokenizer = WhisperTokenizer.from_pretrained(model_id)
# Optional: convert model state_dict → ggml or gguf format
# Save vocab, audio hyperparameters, etc.
# Feed into whisper.cpp conversion chainCoupled with gguf header builder (already present), the final .gguf model can be automatically produced.
Motivation
- Most Whisper checkpoints are now shared in
safetensorsformat - Current conversion scripts (
convert-pt-to-ggml.py) expect.ptwith"dims"key - Users face
KeyError, header errors, or broken audio configs - Hugging Face models contain all required metadata — just not exposed in current scripts
Reference issue: #3315
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels