Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5825
batch : add n_used count (#14512) ggml-ci
b5824
CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (#14002) Co-authored-by: luyuhong <[email protected]>
b5823
ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445)
b5822
opencl : broadcast for soft_max (#14510)
b5821
vulkan: support mixed/deepseekR1 FA head sizes (#14509) * vulkan: better parameterize FA by head sizes * vulkan: support mixed/deepseekR1 FA head sizes
b5820
ggml: backward pass for split swiglu (#14483)
b5819
Fix conditional enabling following arch checks for ggml-sycl (#14504) Signed-off-by: nscipione <[email protected]>
b5817
kv-cache : use ggml_set_rows (#14285) * kv-cache : use ggml_set_rows ggml-ci * graph : separate k and v indices ggml-ci * cont : remove redundant ifs ggml-ci * kv-cache : improve find_slot impl * kv-cache : bounds-check when accessing slot_info indices * kv-cache : add comments ggml-ci * ggml : add TODOs for adding GGML_OP_SET_ROWS support in the backends ggml-ci
b5816
ggml : fix FA mask dim 2 and 3 (#14505) * ggml : fix FA mask dim 2 and 3 ggml-ci * backends : unsupport batched FA in CUDA and Vulkan ggml-ci * vulkan : disable FA for mask->ne[2] != 1
b5815
ggml : remove kompute backend (#14501) ggml-ci