Skip to content

Releases: ggml-org/llama.cpp

b5825

04 Jul 06:37
c79184d
Compare
Choose a tag to compare
batch : add n_used count (#14512)

ggml-ci

b5824

04 Jul 04:04
499a8f5
Compare
Choose a tag to compare
CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (#14002)

Co-authored-by: luyuhong <[email protected]>

b5823

03 Jul 22:14
28657a8
Compare
Choose a tag to compare
ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445)

b5822

03 Jul 19:24
bee2842
Compare
Choose a tag to compare
opencl : broadcast for soft_max (#14510)

b5821

03 Jul 18:57
2b72bed
Compare
Choose a tag to compare
vulkan: support mixed/deepseekR1 FA head sizes (#14509)

* vulkan: better parameterize FA by head sizes

* vulkan: support mixed/deepseekR1 FA head sizes

b5820

03 Jul 15:39
c8c4495
Compare
Choose a tag to compare
ggml: backward pass for split swiglu (#14483)

b5819

03 Jul 11:18
7b63a71
Compare
Choose a tag to compare
Fix conditional enabling following arch checks for ggml-sycl (#14504)

Signed-off-by: nscipione <[email protected]>

b5817

03 Jul 11:17
a70c8a0
Compare
Choose a tag to compare
kv-cache : use ggml_set_rows (#14285)

* kv-cache : use ggml_set_rows

ggml-ci

* graph : separate k and v indices

ggml-ci

* cont : remove redundant ifs

ggml-ci

* kv-cache : improve find_slot impl

* kv-cache : bounds-check when accessing slot_info indices

* kv-cache : add comments

ggml-ci

* ggml : add TODOs for adding GGML_OP_SET_ROWS support in the backends

ggml-ci

b5816

03 Jul 11:13
9067487
Compare
Choose a tag to compare
ggml : fix FA mask dim 2 and 3 (#14505)

* ggml : fix FA mask dim 2 and 3

ggml-ci

* backends : unsupport batched FA in CUDA and Vulkan

ggml-ci

* vulkan : disable FA for mask->ne[2] != 1

b5815

03 Jul 05:05
d4cdd9c
Compare
Choose a tag to compare
ggml : remove kompute backend (#14501)

ggml-ci