Releases · ggml-org/llama.cpp

30 Jun 10:18

c839a2d

b5782

cmake : Remove redundant include path in CMakeLists.txt (#14452)

* Update docker.yml

修改docker.yml文件中的内容使其停止周期性的运行该workflow，如果想要运行该workflow可以手动启动

* Remove redundant include path in CMakeLists.txt

The parent directory '..' was removed from the include directories for the ggml-cpu-feats target, to avoid unnecessary include paths.

* Enable scheduled Docker image builds

Uncomments the workflow schedule to trigger daily Docker image rebuilds at 04:12 UTC, improving automation and keeping images up to date.

Assets 15

29 Jun 18:15

github-actions

b5780

caf5681

b5780

server : support jinja extra template kwargs (Qwen3 enable_thinking f…

Assets 15

29 Jun 16:01

github-actions

b5778

f47c1d7

b5778

SYCL: disable faulty fp16 exp kernel (#14395)

* SYCL: disable faulty fp16 CPU exponent for now

* Revert "SYCL: disable faulty fp16 CPU exponent for now"

This reverts commit ed0aab1ec31b4eb4b0f275dd7acd41d96a375202.

* SYCL: disable faulty fp16 CPU exponent for now

* Fix logic of disabling exponent kernel

Assets 15

29 Jun 13:36

github-actions

b5777

a5d1fb6

b5777

ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (#14443)

Assets 15

29 Jun 08:36

github-actions

b5775

bd9c981

b5775

vulkan: Add fusion support for RMS_NORM+MUL (#14366)

* vulkan: Add fusion support for RMS_NORM+MUL

- Add a use_count to ggml_tensor, so we can detect if an output is used more than once.
- Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor.
- Add detection logic and basic fusion logic in ggml-vulkan.
- Add some testing support for fusion. Rather than computing one node at a time, allow
for computing the whole graph and just testing one node's results. Add rms_norm_mul tests
and enable a llama test.

* extract some common fusion logic

* fix -Winconsistent-missing-override

* move ggml_can_fuse to a common function

* build fix

* C and C++ versions of can_fuse

* move use count to the graph to avoid data races and double increments when used in multiple threads

* use hash table lookup to find node index

* change use_counts to be indexed by hash table slot

* minimize hash lookups

style fixes

* last node doesn't need single use.
fix type.
handle mul operands being swapped.

* remove redundant parameter

---------

Co-authored-by: slaren <[email protected]>

Assets 15

28 Jun 18:00

github-actions

b5774

27208bf

b5774

CUDA: add bf16 and f32 support to cublas_mul_mat_batched (#14361)

* CUDA: add bf16 and f32 support to cublas_mul_mat_batched

* Review: add type traits and make function more generic

* Review: make check more explicit, add back comments, and fix formatting

* Review: fix formatting, remove useless type conversion, fix naming for bools

Assets 15

28 Jun 16:05

github-actions

b5773

63a7bb3

b5773

vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipelin…

Assets 15

28 Jun 15:40

github-actions

b5772

00d5282

b5772

vulkan: lock accesses of pinned_memory vector (#14333)

Assets 15

28 Jun 14:25

github-actions

b5771

566c16f

b5771

model : add support for ERNIE 4.5 0.3B model (#14408)

Add Day-0 support for Baidu ERNIE 4.5 0.3B model.

Signed-off-by: Weizhao Ouyang <[email protected]>

Assets 15

28 Jun 09:55

github-actions

b5770

b25e927

b5770

fix async_mode bug (#14432)

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5782

Uh oh!

b5780

Uh oh!

b5778

Uh oh!

b5777

Uh oh!

b5775

Uh oh!

b5774

Uh oh!

b5773

Uh oh!

b5772

Uh oh!

b5771

Uh oh!

b5770

Uh oh!