Skip to content

Eval bug: When there is version = 1 in the models preset, an additional default model will be created #18428

@ZUIcat

Description

@ZUIcat

Name and Version

build: 7556 (4fd59e8) with Clang 19.1.5 for Windows x86_64

Operating systems

Windows

GGML backends

Vulkan

Hardware

AMD 6800U(680m)

Models

No response

Problem description & steps to reproduce

When I write it this way in the models preset file, an extra "default" is added upon starting the server.

version = 1

[*]
ngl = 999

[Qwen3-0.6B-Instruct]
model = .\_model\Qwen3-0.6B\Qwen3-0.6B-UD-Q8_K_XL__unsloth.gguf
reasoning-budget = 0

[Qwen3-0.6B-Thinking]
model = .\_model\Qwen3-0.6B\Qwen3-0.6B-UD-Q8_K_XL__unsloth.gguf
reasoning-budget = -1

Output:

srv   load_models: Loaded 0 cached model presets
srv   load_models: Loaded 3 custom model presets from .\config_model.ini
srv   load_models: Available models (3) (*: custom preset)
srv   load_models:   * Qwen3-0.6B-Instruct
srv   load_models:   * Qwen3-0.6B-Thinking
srv   load_models:   * default

However, when I delete the version = 1, the srv load_models: * default disappears.

First Bad Commit

No response

Relevant log output

".\llama.cpp_release\llama-server.exe" --webui-config-file config_webui.json --host 0.0.0.0 --port 8090 --props --slots --metrics --models-preset .\config_model.ini -np 1 --no-mmap

load_backend: loaded RPC backend from AI\Llama\llama.cpp_release\ggml-rpc.dll
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon(TM) Graphics (AMD proprietary driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from AI\Llama\llama.cpp_release\ggml-vulkan.dll
load_backend: loaded CPU backend from AI\Llama\llama.cpp_release\ggml-cpu-haswell.dll
build: 7556 (4fd59e8) with Clang 19.1.5 for Windows x86_64
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16

system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |

init: using 15 threads for HTTP server
load_backend: loaded RPC backend from AI\Llama\llama.cpp_release\ggml-rpc.dll
load_backend: loaded Vulkan backend from AI\Llama\llama.cpp_release\ggml-vulkan.dll
load_backend: loaded CPU backend from AI\Llama\llama.cpp_release\ggml-cpu-haswell.dll
load_backend: loaded RPC backend from AI\Llama\llama.cpp_release\ggml-rpc.dll
load_backend: loaded Vulkan backend from AI\Llama\llama.cpp_release\ggml-vulkan.dll
load_backend: loaded CPU backend from AI\Llama\llama.cpp_release\ggml-cpu-haswell.dll
srv load_models: Loaded 0 cached model presets
srv load_models: Loaded 3 custom model presets from .\config_model.ini
srv load_models: Available models (3) (*: custom preset)
srv load_models: * Qwen3-0.6B-Instruct
srv load_models: * Qwen3-0.6B-Thinking
srv load_models: * default
main: starting router server, no model will be loaded in this process
start: binding port with default address family
main: router server is listening on http://0.0.0.0:8090
main: NOTE: router mode is experimental
main: it is not recommended to use this mode in untrusted environments

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions