sync : llama.cpp #1278

ggerganov · 2025-06-18T07:00:49Z

No description provided.

… device is available, to allow fallback to CPU backend (llama/14099)

Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8) and move it to the vk_device. Move all the descriptor pool and set tracking to the context - none of it is specific to pipelines anymore. It has a single vector of pools and vector of sets, and a single counter to track requests and a single counter to track use.

This change moves the command pool/buffer tracking into a vk_command_pool structure. There are two instances per context (for compute+transfer) and two instances per device for operations that don't go through a context. This should prevent separate contexts from stomping on each other.

* ggml-cpu: Factor out feature detection build from x86 * ggml-cpu: Add ARM feature detection and scoring This is analogous to cpu-feats-x86.cpp. However, to detect compile-time activation of features, we rely on GGML_USE_<FEAT> which need to be set in cmake, instead of GGML_<FEAT> that users would set for x86. This is because on ARM, users specify features with GGML_CPU_ARM_ARCH, rather than with individual flags. * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM Like x86, however to pass around arch flags within cmake, we use GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>. Some features are optional, so we may need to build multiple backends per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring function sort out which one can be used. * ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now The other platforms will need their own specific variants. This also fixes the bug that the the variant-building branch was always being executed as the else-branch of GGML_NATIVE=OFF. The branch is moved to an elseif-branch which restores the previous behavior.

* cmake : handle whitepsaces in path during metal build ggml-ci * cont : proper fix ggml-ci --------- Co-authored-by: Daniel Bevenius <[email protected]>

Update oneMath commit to merged PR uxlfoundation/oneMath#669 which adds SYCL-Graph support for recording CUDA BLAS commands. With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph enabled. Prior to this change, an error would be thrown. ``` $ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2 UR CUDA ERROR: Value: 700 Name: CUDA_ERROR_ILLEGAL_ADDRESS Description: an illegal memory access was encountered Function: operator() Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154 Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN) Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator() SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code! in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598 $HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. No stack. The program is not being run. ```

…T_SIZE__ (llama/14183)

…196)

* ggml-cpu : rework weak alias on apple targets * fix powerpc detection * fix ppc detection * fix powerpc detection on darwin

This fixes the remaining crash in test-thread-safety on my system.

…14179) * Remove install step for vulkan-shaders-gen * Add install step to normalize msvc with make * Regenerate modified shaders at build-time

* llama : add thread safety test * llamafile : remove global state * llama : better LLAMA_SPLIT_MODE_NONE logic when main_gpu < 0 GPU devices are not used --------- Co-authored-by: Georgi Gerganov <[email protected]>

Signed-off-by: Xiaodong Ye <[email protected]>

* Remove step-targets from vulkan-shaders-gen * Unset DESTDIR when building vulkan-shaders-gen

ggml-ci

isaac-mcfadyen and others added 22 commits June 18, 2025 10:00

rpc : nicer error messages for RPC server crash (llama/14076)

37e5080

Vulkan: Don't default to CPU device (like llvmpipe), even if no other…

d01c9de

… device is available, to allow fallback to CPU backend (llama/14099)

opencl: add mul_mv_id_q4_0_f32_8x_flat (llama/14003)

13c1686

cmake : handle whitepsaces in path during metal build (llama/14126)

d218c64

* cmake : handle whitepsaces in path during metal build ggml-ci * cont : proper fix ggml-ci --------- Co-authored-by: Daniel Bevenius <[email protected]>

sycl: Remove not needed copy f16->f32 for dnnl mul mat (llama/14125)

4518267

sycl: Adding additional cpy dbg print output (llama/14034)

fd833d9

HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRON…

702d68f

…T_SIZE__ (llama/14183)

CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (llama/14…

eb67171

…196)

ggml-cpu : rework weak alias on apple targets (llama/14146)

b66b563

* ggml-cpu : rework weak alias on apple targets * fix powerpc detection * fix ppc detection * fix powerpc detection on darwin

vulkan: mutex around vkQueueSubmit (llama/14127)

9865b34

This fixes the remaining crash in test-thread-safety on my system.

ggml: Add Android support for GGML_CPU_ALL_VARIANTS (llama/14206)

eb438fe

HIP: disable rocwmma on gfx12 by default until rocm 7.0 (llama/14202)

d2f4a67

cmake: clean up external project logic for vulkan-shaders-gen (llama/…

abd484b

…14179) * Remove install step for vulkan-shaders-gen * Add install step to normalize msvc with make * Regenerate modified shaders at build-time

llama : add thread safety test (llama/14035)

cbdc441

* llama : add thread safety test * llamafile : remove global state * llama : better LLAMA_SPLIT_MODE_NONE logic when main_gpu < 0 GPU devices are not used --------- Co-authored-by: Georgi Gerganov <[email protected]>

musa: fix build warning (unused variable) (llama/14231)

615709a

Signed-off-by: Xiaodong Ye <[email protected]>

ggml-cpu : remove the weak alias trick (llama/14221)

dd60b3c

cmake: remove shader-gen step-targets from ggml-vulkan (llama/14226)

86721f7

* Remove step-targets from vulkan-shaders-gen * Unset DESTDIR when building vulkan-shaders-gen

sync : llama.cpp

b6ae3c2

ggml-ci

danbev approved these changes Jun 18, 2025

View reviewed changes

ggerganov merged commit c486ab3 into master Jun 18, 2025
12 checks passed

ggerganov deleted the sync-llama.cpp-25-06-18 branch June 18, 2025 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sync : llama.cpp #1278

sync : llama.cpp #1278

Uh oh!

ggerganov commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

sync : llama.cpp #1278

sync : llama.cpp #1278

Uh oh!

Conversation

ggerganov commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!