Skip to content

llama : add thread safety test #14035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

llama : add thread safety test #14035

wants to merge 10 commits into from

Conversation

slaren
Copy link
Member

@slaren slaren commented Jun 5, 2025

Basic thread safety tests that loads a copy of the model on each GPU and CPU, and runs inference with multiple contexts in different threads.

llama : ignore main_gpu <= 0 if there are no GPUs

ggml-ci
@slaren slaren requested a review from ggerganov as a code owner June 5, 2025 17:03
@github-actions github-actions bot added testing Everything test related devops improvements to build systems and github actions labels Jun 5, 2025
@ggerganov
Copy link
Member

Maybe we can use an even smaller model for this test:

https://huggingface.co/ggml-org/models/tree/main/tinyllamas

@slaren
Copy link
Member Author

slaren commented Jun 6, 2025

The SYCL ggml-ci does not seem to have libcurl installed yet.

@ggerganov
Copy link
Member

Should be installed now.

@slaren slaren force-pushed the sl/thread-safety-test branch from 2c5874e to a2a0289 Compare June 6, 2025 11:18
ggml-ci
@slaren slaren force-pushed the sl/thread-safety-test branch from a2a0289 to b046f0c Compare June 6, 2025 12:13
@slaren
Copy link
Member Author

slaren commented Jun 6, 2025

There is some issue with this model (stories15M-q4_0.gguf) on CPU, but I don't think it is a threading issue. Only seems to happen on CPUs with AVX512.

test-thread-safety: /home/ggml/work/llama.cpp/ggml/src/ggml-cpu/ops.cpp:2934: void ggml_compute_forward_silu_f32(const ggml_compute_params*, ggml_tensor*): Assertion `!isnan(x)' failed.

@ggerganov
Copy link
Member

I looked into it a bit and it does not seem to happen if OpenMP is disabled. Think it is something related to the repacking, but didn't confirm. I'll take an extra look now.

@ggerganov
Copy link
Member

Pretty sure this is a data-race because the chunk counter will be shared by all contexts:

template <int RM, int RN, int BM>
NOINLINE void gemm(int64_t m, int64_t n, int64_t BN) {
static std::atomic<int64_t> current_chunk;

If I disable GGML_LLAMAFILE on ggml-2 the test works correctly even with OpenMP enabled.

@Djip007 Could you take a look and propose a fix?

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 6, 2025
@slaren
Copy link
Member Author

slaren commented Jun 6, 2025

19: 0.00.068.420 E common_download_file_single: invalid http status code received: 429

429 is "too many requests". @ngxson do you know if it is a temporary issue with huggingface, or are we being throttled?

@ngxson
Copy link
Collaborator

ngxson commented Jun 6, 2025

HF backend currently has a problem, the team is investigating, should be back very soon

@slaren
Copy link
Member Author

slaren commented Jun 6, 2025

@0cc4m @jeffbolznv The Vulkan backend is crashing on this test. It happens even with a single context per model (-np 1), which is not great because it would prevent, for example, evaluating a draft model simultaneously with the main model. I can hold merging this if you think it could be fixed in the near future, otherwise it might be better to disable the Vulkan CI tests for now.

@0cc4m
Copy link
Collaborator

0cc4m commented Jun 6, 2025

It is known that the Vulkan backend is not thread-safe yet, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions ggml changes relating to the ggml tensor library for machine learning testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants