Skip to content

Eval bug: llama-simple-chat crashes with "failed to decode" after some requests #14487

Closed
@kayahr

Description

@kayahr

Name and Version

$ bin/llama-cli --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 4080 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
version: 5797 (de56944)
built with cc (Debian 14.2.0-19) 14.2.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

Vulkan

Hardware

NVidia RTX 4080

Models

ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf

Problem description & steps to reproduce

I want to write a small application using llama.cpp and I orient myself on the simple-chat example. But after some conversation with the AI model the program crashes with the error message failed to decode.

I can reproduce this with the unmodified simple-chat example. I just do some chit chat with the model:

$ bin/llama-simple-chat -m ggml-org_gemma-3-1b-it-GGUF_gemma-3-1b-it-Q4_K_M.gguf 
.............................................
> Tell me a joke.
[some unfunny joke]
> Tell me another one.
[some other even more unfunny joke]
...repeatedly ask for jokes or any other kind of conversation...

And after some questions and answers it crashes:

simple-chat.cpp:124: failed to decode

This does not happen with llama-cli using the same model but this program is MUCH to complex to use as a base for an application. So it would be nice to find out what causes the problem in the simple-chat program.

I guess this has something to do with the context size? When I select a smaller one then the crash happens much faster. Is this supposed to happen? Or is there something wrong in the example regarding the context handling?

First Bad Commit

No response

Relevant log output

simple-chat.cpp:124: failed to decode

[New LWP 67199]
[New LWP 67196]
[New LWP 67195]
[New LWP 67194]
[New LWP 67192]
[New LWP 67191]
Again[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f142d0a49ee in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007f142d0a49ee in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f142d099668 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f142d0996ad in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f142d104787 in wait4 () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007f142d5f206b in ggml_print_backtrace () from lama.cpp/build/bin/libggml-base.so
#5  0x00007f142d5f2166 in ggml_abort () from llama.cpp/build/bin/libggml-base.so
#6  0x000055db7e3b2999 in main ()
[Inferior 1 (process 67190) detached]
Aborted

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions