-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Description
Name and Version
./llama-cli --version
load_backend: loaded RPC backend from /media/veracrypt1/code/llama-b7574/libggml-rpc.so
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Unknown (RADV GFX1103_R1) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from /media/veracrypt1/code/llama-b7574/libggml-vulkan.so
load_backend: loaded CPU backend from /media/veracrypt1/code/llama-b7574/libggml-cpu-zen4.so
version: 7574 (5b1248c)
built with GNU 11.4.0 for Linux x86_64
Operating systems
Linux
GGML backends
Vulkan
Hardware
AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
Models
unsloth/gpt-oss-20b-GGUF:F16
Problem description & steps to reproduce
./llama-server -hf unsloth/gpt-oss-20b-GGUF:F16 --jinja -ngl 99 --threads -1 --parallel 4 --ctx-size 16384 --temp 1.0 --top-p 1.0 --top-k 0 --no-mmap --kv-unified --n_predict 4096 --chat-template-kwargs '{"reasoning_effort": "low"}'
using b7574 (Ubuntu x64 Vulkan), it fails with:
load_tensors: Vulkan0 model buffer size = 12036.67 MiB
load_tensors: Vulkan_Host model buffer size = 1104.61 MiB
llama_model_load: error loading model: read error: Bad address
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/tb/.cache/llama.cpp/unsloth_gpt-oss-20b-GGUF_gpt-oss-20b-F16.gguf'
srv load_model: failed to load model, '/home/tb/.cache/llama.cpp/unsloth_gpt-oss-20b-GGUF_gpt-oss-20b-F16.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
using b7472 (Ubuntu x64 Vulkan), there is no such error.
I'm using Ubuntu 22.04.5 LTS
First Bad Commit
b7574 fails
b7502 fails
b7501 succeeds
b7472 succeeds
Relevant log output
Logs
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: offloading output layer to GPU
load_tensors: offloading 23 repeating layers to GPU
load_tensors: offloaded 25/25 layers to GPU
load_tensors: Vulkan0 model buffer size = 12036.67 MiB
load_tensors: Vulkan_Host model buffer size = 1104.61 MiB
llama_model_load: error loading model: read error: Bad address
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/tb/.cache/llama.cpp/unsloth_gpt-oss-20b-GGUF_gpt-oss-20b-F16.gguf'
srv load_model: failed to load model, '/home/tb/.cache/llama.cpp/unsloth_gpt-oss-20b-GGUF_gpt-oss-20b-F16.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error