Skip to content

Feature Request: convert_hf_to_gguf.py fails on hybrid Phi-3/T5 model #18440

@kalle07

Description

@kalle07

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

all usual models like llama, qwen etc works fine!

model:
https://huggingface.co/aari1995/German_Semantic_V3b

LOG (updated):

WARNING:hf-to-gguf:

WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!
WARNING:hf-to-gguf:**          There are 2 possible reasons for this:
WARNING:hf-to-gguf:**          - the model has not been added to convert_hf_to_gguf_update.py yet
WARNING:hf-to-gguf:**          - the pre-tokenization config has changed upstream
WARNING:hf-to-gguf:**          Check your model files and convert_hf_to_gguf_update.py and update them accordingly.
WARNING:hf-to-gguf:** ref:     https://github.com/ggml-org/llama.cpp/pull/6920
WARNING:hf-to-gguf:**
WARNING:hf-to-gguf:** chkhsh:  b3d1dd861f1d4c5c0d2569ce36baf3f90fe8a102db3de50dd71ff860d91be3df
WARNING:hf-to-gguf:**************************************************************************************
WARNING:hf-to-gguf:

Traceback (most recent call last):
  File "c:\Users\123\Documents\python\autoround\llama.cpp\convert_hf_to_gguf.py", line 10353, in <module>
    main()
  File "c:\Users\123\Documents\python\autoround\llama.cpp\convert_hf_to_gguf.py", line 10347, in main
    model_instance.write()
  File "c:\Users\123\Documents\python\autoround\llama.cpp\convert_hf_to_gguf.py", line 660, in write
    self.prepare_metadata(vocab_only=False)
  File "c:\Users\123\Documents\python\autoround\llama.cpp\convert_hf_to_gguf.py", line 781, in prepare_metadata
    self.set_vocab()
  File "c:\Users\123\Documents\python\autoround\llama.cpp\convert_hf_to_gguf.py", line 6696, in set_vocab
    super().set_vocab()
  File "c:\Users\123\Documents\python\autoround\llama.cpp\convert_hf_to_gguf.py", line 5099, in set_vocab
    tokens, toktypes, tokpre = self.get_vocab_base()
                               ^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\123\Documents\python\autoround\llama.cpp\convert_hf_to_gguf.py", line 877, in get_vocab_base
    tokpre = self.get_vocab_base_pre(tokenizer)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\123\Documents\python\autoround\llama.cpp\convert_hf_to_gguf.py", line 1150, in get_vocab_base_pre
    raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()")
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()

with
convert_hf_to_gguf_update.py
i get two folders t5 and phi-3

Motivation

more models to convert

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions