llama : add thread safety test #14035

slaren · 2025-06-05T17:03:27Z

Basic thread safety tests that loads a copy of the model on each GPU and CPU, and runs inference with multiple contexts in different threads.

llama : ignore main_gpu <= 0 if there are no GPUs ggml-ci

ci/run.sh

ggerganov · 2025-06-06T08:56:38Z

Maybe we can use an even smaller model for this test:

https://huggingface.co/ggml-org/models/tree/main/tinyllamas

ggml-ci

slaren · 2025-06-06T11:07:58Z

The SYCL ggml-ci does not seem to have libcurl installed yet.

ggerganov · 2025-06-06T11:10:27Z

Should be installed now.

ggml-ci

slaren · 2025-06-06T12:22:53Z

There is some issue with this model (stories15M-q4_0.gguf) on CPU, but I don't think it is a threading issue. Only seems to happen on CPUs with AVX512.

test-thread-safety: /home/ggml/work/llama.cpp/ggml/src/ggml-cpu/ops.cpp:2934: void ggml_compute_forward_silu_f32(const ggml_compute_params*, ggml_tensor*): Assertion `!isnan(x)' failed.

ggerganov · 2025-06-06T12:30:18Z

I looked into it a bit and it does not seem to happen if OpenMP is disabled. Think it is something related to the repacking, but didn't confirm. I'll take an extra look now.

ggerganov · 2025-06-06T13:16:10Z

Pretty sure this is a data-race because the chunk counter will be shared by all contexts:

llama.cpp/ggml/src/ggml-cpu/llamafile/sgemm.cpp

Lines 395 to 398 in 487a5e0

    
           template <int RM, int RN, int BM> 
        
           NOINLINE void gemm(int64_t m, int64_t n, int64_t BN) { 
        
               static std::atomic<int64_t> current_chunk;

If I disable GGML_LLAMAFILE on ggml-2 the test works correctly even with OpenMP enabled.

@Djip007 Could you take a look and propose a fix?

ggml-ci

ggml/src/ggml-cpu/llamafile/sgemm.cpp

ggml-ci

ggml/src/ggml-cpu/ggml-cpu.c

ggml-ci

slaren · 2025-06-06T14:33:47Z

19: 0.00.068.420 E common_download_file_single: invalid http status code received: 429

429 is "too many requests". @ngxson do you know if it is a temporary issue with huggingface, or are we being throttled?

ngxson · 2025-06-06T14:38:52Z

HF backend currently has a problem, the team is investigating, should be back very soon

ggml-ci

slaren · 2025-06-06T20:31:15Z

@0cc4m @jeffbolznv The Vulkan backend is crashing on this test. It happens even with a single context per model (-np 1), which is not great because it would prevent, for example, evaluating a draft model simultaneously with the main model. I can hold merging this if you think it could be fixed in the near future, otherwise it might be better to disable the Vulkan CI tests for now.

0cc4m · 2025-06-06T20:36:15Z

It is known that the Vulkan backend is not thread-safe yet, yes.

llama : add thread safety test

e14d9d8

llama : ignore main_gpu <= 0 if there are no GPUs ggml-ci

slaren requested a review from ggerganov as a code owner June 5, 2025 17:03

github-actions bot added testing Everything test related devops improvements to build systems and github actions labels Jun 5, 2025

slaren commented Jun 5, 2025

View reviewed changes

ci/run.sh Show resolved Hide resolved

ggerganov approved these changes Jun 6, 2025

View reviewed changes

use smaller stories15M-q4_0.gguf model

bf45300

ggml-ci

slaren force-pushed the sl/thread-safety-test branch from 2c5874e to a2a0289 Compare June 6, 2025 11:18

test

b046f0c

ggml-ci

slaren force-pushed the sl/thread-safety-test branch from a2a0289 to b046f0c Compare June 6, 2025 12:13

llamafile : remove global state

169774a

ggml-ci

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 6, 2025

slaren commented Jun 6, 2025

View reviewed changes

ggml/src/ggml-cpu/llamafile/sgemm.cpp Show resolved Hide resolved

cont : reuse current_chunk from ggml_threadpool

8ef4b95

ggml-ci

slaren commented Jun 6, 2025

View reviewed changes

ggml/src/ggml-cpu/ggml-cpu.c Outdated Show resolved Hide resolved

cont : memory order relaxed

03da6c8

ggml-ci

slaren added 4 commits June 6, 2025 20:37

cleanup

00ad177

move context creation to the threads to test it too

292c4e7

ggml-ci

load all models first

2158fd0

Merge remote-tracking branch 'origin/master' into sl/thread-safety-test

29020e6

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : add thread safety test #14035

llama : add thread safety test #14035

Uh oh!

slaren commented Jun 5, 2025

Uh oh!

Uh oh!

ggerganov commented Jun 6, 2025

Uh oh!

slaren commented Jun 6, 2025

Uh oh!

ggerganov commented Jun 6, 2025

Uh oh!

slaren commented Jun 6, 2025 •

edited

Loading

Uh oh!

ggerganov commented Jun 6, 2025

Uh oh!

ggerganov commented Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!

slaren commented Jun 6, 2025

Uh oh!

ngxson commented Jun 6, 2025

Uh oh!

slaren commented Jun 6, 2025

Uh oh!

0cc4m commented Jun 6, 2025

Uh oh!

Uh oh!

llama : add thread safety test #14035

Are you sure you want to change the base?

llama : add thread safety test #14035

Uh oh!

Conversation

slaren commented Jun 5, 2025

Uh oh!

Uh oh!

ggerganov commented Jun 6, 2025

Uh oh!

slaren commented Jun 6, 2025

Uh oh!

ggerganov commented Jun 6, 2025

Uh oh!

slaren commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Jun 6, 2025

Uh oh!

ggerganov commented Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!

slaren commented Jun 6, 2025

Uh oh!

ngxson commented Jun 6, 2025

Uh oh!

slaren commented Jun 6, 2025

Uh oh!

0cc4m commented Jun 6, 2025

Uh oh!

Uh oh!

slaren commented Jun 6, 2025 •

edited

Loading