-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Description
Name and Version
./build/bin/llama-cli --version
version: 7449 (6853bee)
built with GNU 11.4.0 for Linux x86_64
Operating systems
Linux
GGML backends
CPU
Hardware
AMD EPYC 9K84 96-Core Processor
Models
jina-embeddings-v2-base-zh
Problem description & steps to reproduce
1.Convert HF model to GGUF model:
/app/jczhao/miniconda3/envs/llamacpp/bin/python convert_hf_to_gguf.py
/app/jczhao/jina-embeddings-v2-base-zh
--outfile /app/jczhao/jina-embeddings-v2-base-zh-f32.gguf
--outtype f32
--metadata "general.architecture=encoder"
--metadata "encoder.context_length=8192"
--metadata "general.name=jina-embeddings-v2-base-zh"
--verbose
2.Llama server deployment command:
sudo nohup ./build/bin/llama-server -m /app/jczhao/jina-embeddings-v2-base-zh-f32.gguf -t 24 -c 8192 -b 512 --host 0.0.0.0 --port 7868 --embeddings --cont-batching --no-mmap > llama-server-7.log 2>&1 &
Compared to the official HF model and ONNX model,the vector of jina-embeddings-v2-base-zh gguf model retrieval result is incorrect!
Please help me solve the issue,thanks!
First Bad Commit
No response