sync : llama.cpp #1192

ggerganov · 2025-04-14T06:27:12Z

No description provided.

ggml-ci

* ggml: fixes #12846 compilation error Signed-off-by: Aaron Teo <[email protected]> Co-authored-by: Aleksei Nikiforov <[email protected]> * ggml: add documentation for code change Signed-off-by: Aaron Teo <[email protected]> Co-authored-by: Aleksei Nikiforov <[email protected]> * ggml: refactor to type-cast and update documentation Signed-off-by: Aaron Teo <[email protected]> Co-authored-by: Aleksei Nikiforov <[email protected]> * ggml: update documentation to provide full issue link Signed-off-by: Aaron Teo <[email protected]> Co-authored-by: Aleksei Nikiforov <[email protected]> --------- Co-authored-by: Aleksei Nikiforov <[email protected]>

* SYCL: Add fp16 support to some elementwise OP kernels * remove comment ggml-ci * Use static_cast directly * remove not needed cast from tanh * Use static cast and remove unneeded castings * Adjust device_support_op for unary OPs * Use cast_data and typed_data struct to deduplicate casting code

The current usage of the SYCL-Graph extension checks for the `sycl_ext_oneapi_graph` device aspect. However, it is also possible to support `sycl_ext_oneapi_limied_graph` devices that don't support update

Rewrite the stride logic for the mask tensor in the FA shader to force the stride to be aligned, to allow using more efficient loads.

…ama/12891) Fixes #12798

… the result register (llama/12773) * ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register * simplifies the codebase by removing redundant functions

ggml-ci

ggerganov and others added 8 commits April 14, 2025 09:26

tests : fix init order (llama/0)

72b19d9

ggml-ci

sycl: Support sycl_ext_oneapi_limited_graph (llama/12873)

6dfcc8f

The current usage of the SYCL-Graph extension checks for the `sycl_ext_oneapi_graph` device aspect. However, it is also possible to support `sycl_ext_oneapi_limied_graph` devices that don't support update

vulkan: use aligned loads for flash attention mask (llama/12853)

641b2b5

Rewrite the stride logic for the mask tensor in the FA shader to force the stride to be aligned, to allow using more efficient loads.

ggml: disable CUDA graphs for unsupported DUP and CONT node types (ll…

ce1fbfe

…ama/12891) Fixes #12798

ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into…

b45b453

… the result register (llama/12773) * ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register * simplifies the codebase by removing redundant functions

sync : llama.cpp

c78b191

ggml-ci

ggerganov merged commit be935ac into master Apr 14, 2025
11 checks passed

ggerganov deleted the sync-llama.cpp-25-04-14 branch April 14, 2025 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp #1192

sync : llama.cpp #1192

ggerganov commented Apr 14, 2025

sync : llama.cpp #1192

sync : llama.cpp #1192

Conversation

ggerganov commented Apr 14, 2025