-
Notifications
You must be signed in to change notification settings - Fork 14.3k
Description
Name and Version
build: 7556 (4fd59e8) with Clang 19.1.5 for Windows x86_64
Operating systems
Windows
GGML backends
Vulkan
Hardware
AMD 6800U(680m)
Models
No response
Problem description & steps to reproduce
When I write it this way in the models preset file, an extra "default" is added upon starting the server.
version = 1
[*]
ngl = 999
[Qwen3-0.6B-Instruct]
model = .\_model\Qwen3-0.6B\Qwen3-0.6B-UD-Q8_K_XL__unsloth.gguf
reasoning-budget = 0
[Qwen3-0.6B-Thinking]
model = .\_model\Qwen3-0.6B\Qwen3-0.6B-UD-Q8_K_XL__unsloth.gguf
reasoning-budget = -1Output:
srv load_models: Loaded 0 cached model presets
srv load_models: Loaded 3 custom model presets from .\config_model.ini
srv load_models: Available models (3) (*: custom preset)
srv load_models: * Qwen3-0.6B-Instruct
srv load_models: * Qwen3-0.6B-Thinking
srv load_models: * default
However, when I delete the version = 1, the srv load_models: * default disappears.
First Bad Commit
No response
Relevant log output
".\llama.cpp_release\llama-server.exe" --webui-config-file config_webui.json --host 0.0.0.0 --port 8090 --props --slots --metrics --models-preset .\config_model.ini -np 1 --no-mmap
load_backend: loaded RPC backend from AI\Llama\llama.cpp_release\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon(TM) Graphics (AMD proprietary driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from AI\Llama\llama.cpp_release\ggml-vulkan.dll
load_backend: loaded CPU backend from AI\Llama\llama.cpp_release\ggml-cpu-haswell.dll
build: 7556 (4fd59e8) with Clang 19.1.5 for Windows x86_64
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
init: using 15 threads for HTTP server
load_backend: loaded RPC backend from AI\Llama\llama.cpp_release\ggml-rpc.dll
load_backend: loaded Vulkan backend from AI\Llama\llama.cpp_release\ggml-vulkan.dll
load_backend: loaded CPU backend from AI\Llama\llama.cpp_release\ggml-cpu-haswell.dll
load_backend: loaded RPC backend from AI\Llama\llama.cpp_release\ggml-rpc.dll
load_backend: loaded Vulkan backend from AI\Llama\llama.cpp_release\ggml-vulkan.dll
load_backend: loaded CPU backend from AI\Llama\llama.cpp_release\ggml-cpu-haswell.dll
srv load_models: Loaded 0 cached model presets
srv load_models: Loaded 3 custom model presets from .\config_model.ini
srv load_models: Available models (3) (*: custom preset)
srv load_models: * Qwen3-0.6B-Instruct
srv load_models: * Qwen3-0.6B-Thinking
srv load_models: * default
main: starting router server, no model will be loaded in this process
start: binding port with default address family
main: router server is listening on http://0.0.0.0:8090
main: NOTE: router mode is experimental
main: it is not recommended to use this mode in untrusted environments