Skip to content

Releases: ggml-org/llama.cpp

b5782

30 Jun 10:18
c839a2d
Compare
Choose a tag to compare
cmake : Remove redundant include path in CMakeLists.txt (#14452)

* Update docker.yml

修改docker.yml文件中的内容使其停止周期性的运行该workflow,如果想要运行该workflow可以手动启动

* Remove redundant include path in CMakeLists.txt

The parent directory '..' was removed from the include directories for the ggml-cpu-feats target, to avoid unnecessary include paths.

* Enable scheduled Docker image builds

Uncomments the workflow schedule to trigger daily Docker image rebuilds at 04:12 UTC, improving automation and keeping images up to date.

b5780

29 Jun 18:15
caf5681
Compare
Choose a tag to compare
server : support jinja extra template kwargs (Qwen3 enable_thinking f…

b5778

29 Jun 16:01
f47c1d7
Compare
Choose a tag to compare
SYCL: disable faulty fp16 exp kernel (#14395)

* SYCL: disable faulty fp16 CPU exponent for now

* Revert "SYCL: disable faulty fp16 CPU exponent for now"

This reverts commit ed0aab1ec31b4eb4b0f275dd7acd41d96a375202.

* SYCL: disable faulty fp16 CPU exponent for now

* Fix logic of disabling exponent kernel

b5777

29 Jun 13:36
a5d1fb6
Compare
Choose a tag to compare
ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (#14443)

b5775

29 Jun 08:36
bd9c981
Compare
Choose a tag to compare
vulkan: Add fusion support for RMS_NORM+MUL (#14366)

* vulkan: Add fusion support for RMS_NORM+MUL

- Add a use_count to ggml_tensor, so we can detect if an output is used more than once.
- Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor.
- Add detection logic and basic fusion logic in ggml-vulkan.
- Add some testing support for fusion. Rather than computing one node at a time, allow
for computing the whole graph and just testing one node's results. Add rms_norm_mul tests
and enable a llama test.

* extract some common fusion logic

* fix -Winconsistent-missing-override

* move ggml_can_fuse to a common function

* build fix

* C and C++ versions of can_fuse

* move use count to the graph to avoid data races and double increments when used in multiple threads

* use hash table lookup to find node index

* change use_counts to be indexed by hash table slot

* minimize hash lookups

style fixes

* last node doesn't need single use.
fix type.
handle mul operands being swapped.

* remove redundant parameter

---------

Co-authored-by: slaren <[email protected]>

b5774

28 Jun 18:00
27208bf
Compare
Choose a tag to compare
CUDA: add bf16 and f32 support to cublas_mul_mat_batched (#14361)

* CUDA: add bf16 and f32 support to cublas_mul_mat_batched

* Review: add type traits and make function more generic

* Review: make check more explicit, add back comments, and fix formatting

* Review: fix formatting, remove useless type conversion, fix naming for bools

b5773

28 Jun 16:05
63a7bb3
Compare
Choose a tag to compare
vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipelin…

b5772

28 Jun 15:40
00d5282
Compare
Choose a tag to compare
vulkan: lock accesses of pinned_memory vector (#14333)

b5771

28 Jun 14:25
566c16f
Compare
Choose a tag to compare
model : add support for ERNIE 4.5 0.3B model (#14408)

Add Day-0 support for Baidu ERNIE 4.5 0.3B model.

Signed-off-by: Weizhao Ouyang <[email protected]>

b5770

28 Jun 09:55
b25e927
Compare
Choose a tag to compare
fix async_mode bug (#14432)