Skip to content

simple-chat : fix context-exceeded condition #14494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 2, 2025

Conversation

ggerganov
Copy link
Member

fix #14487

Fix off-by-one error. New behavior after the fix:

make -j && ./bin/llama-simple-chat -m ../models/gemma-3-1b-it/ggml-model-q8_0.gguf -c 128

> Tell me a long story
Okay, here’s a long story, aiming for a bit of depth and emotional resonance. It’s a bit sprawling, so buckle up! It’s titled “The Cartographer’s Echo.”
---
The salt spray stung Elias’s face as he adjusted the compass, the needle spinning wildly in the grey, relentless wind. He was perched on the crumbling cliffs of Aethelgard, a tiny, forgotten village clinging to the edge of the Whispering Sea, a place time seemed to have deliberately abandoned. He’d inherited the cartography shop
context size exceeded

@@ -114,14 +114,15 @@ int main(int argc, char ** argv) {
// check if we have enough space in the context to evaluate this batch
int n_ctx = llama_n_ctx(ctx);
int n_ctx_used = llama_memory_seq_pos_max(llama_get_memory(ctx), 0);
if (n_ctx_used + batch.n_tokens > n_ctx) {
if (n_ctx_used + batch.n_tokens >= n_ctx) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more precise, I think it would be better to add 1 to the value returned by llama_memory_seq_pos_max.

int n_ctx_used = llama_memory_seq_pos_max(llama_get_memory(ctx), 0) + 1; 

@ggerganov ggerganov merged commit d7f5f4e into master Jul 2, 2025
48 of 53 checks passed
@ggerganov ggerganov deleted the gg/simple-chat-error-handle branch July 2, 2025 11:12
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Jul 2, 2025
* origin/master:
llama : initial Mamba-2 support (ggml-org#9126)
sync : ggml
ggml : add version function to get lib version (ggml/1286)
Set RPATH to "@loader_path" / "$ORIGIN" to ensure executables and dynamic libraries search for dependencies in their origin directory. (ggml-org#14309)
CUDA: add softmax broadcast (ggml-org#14475)
CUDA: broadcasting for FlashAttention mask (ggml-org#14500)
vulkan: support softmax/FA batch and broadcast (ggml-org#14449)
ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (ggml-org#14435)
opencl : fix possible buffer overflow in dump_tensor (ggml-org#14490)
simple-chat : fix context-exceeded condition (ggml-org#14494)
opencl : skip empty nodes on cgraph compute (ggml-org#14491)
opencl : update upscale to support align corners (ggml-org#14488)
ci : add OpenCL to labeler workflow (ggml-org#14496)
github : add OpenCL backend to issue templates (ggml-org#14492)
ggml : Callback before abort (ggml-org#14481)
ci : disable fast-math for Metal GHA CI (ggml-org#14478)
Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 5, 2025
* simple-chat : fix context-exceeded condition

ggml-ci

* cont : fix n_ctx_used computation

ggml-ci
qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 6, 2025
* simple-chat : fix context-exceeded condition

ggml-ci

* cont : fix n_ctx_used computation

ggml-ci
qnixsynapse pushed a commit to menloresearch/llama.cpp that referenced this pull request Jul 6, 2025
* simple-chat : fix context-exceeded condition

ggml-ci

* cont : fix n_ctx_used computation

ggml-ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eval bug: llama-simple-chat crashes with "failed to decode" after some requests
2 participants