Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5782
cmake : Remove redundant include path in CMakeLists.txt (#14452) * Update docker.yml 修改docker.yml文件中的内容使其停止周期性的运行该workflow,如果想要运行该workflow可以手动启动 * Remove redundant include path in CMakeLists.txt The parent directory '..' was removed from the include directories for the ggml-cpu-feats target, to avoid unnecessary include paths. * Enable scheduled Docker image builds Uncomments the workflow schedule to trigger daily Docker image rebuilds at 04:12 UTC, improving automation and keeping images up to date.
b5780
server : support jinja extra template kwargs (Qwen3 enable_thinking f…
b5778
SYCL: disable faulty fp16 exp kernel (#14395) * SYCL: disable faulty fp16 CPU exponent for now * Revert "SYCL: disable faulty fp16 CPU exponent for now" This reverts commit ed0aab1ec31b4eb4b0f275dd7acd41d96a375202. * SYCL: disable faulty fp16 CPU exponent for now * Fix logic of disabling exponent kernel
b5777
ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (#14443)
b5775
vulkan: Add fusion support for RMS_NORM+MUL (#14366) * vulkan: Add fusion support for RMS_NORM+MUL - Add a use_count to ggml_tensor, so we can detect if an output is used more than once. - Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor. - Add detection logic and basic fusion logic in ggml-vulkan. - Add some testing support for fusion. Rather than computing one node at a time, allow for computing the whole graph and just testing one node's results. Add rms_norm_mul tests and enable a llama test. * extract some common fusion logic * fix -Winconsistent-missing-override * move ggml_can_fuse to a common function * build fix * C and C++ versions of can_fuse * move use count to the graph to avoid data races and double increments when used in multiple threads * use hash table lookup to find node index * change use_counts to be indexed by hash table slot * minimize hash lookups style fixes * last node doesn't need single use. fix type. handle mul operands being swapped. * remove redundant parameter --------- Co-authored-by: slaren <[email protected]>
b5774
CUDA: add bf16 and f32 support to cublas_mul_mat_batched (#14361) * CUDA: add bf16 and f32 support to cublas_mul_mat_batched * Review: add type traits and make function more generic * Review: make check more explicit, add back comments, and fix formatting * Review: fix formatting, remove useless type conversion, fix naming for bools
b5773
vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipelin…
b5772
vulkan: lock accesses of pinned_memory vector (#14333)
b5771
model : add support for ERNIE 4.5 0.3B model (#14408) Add Day-0 support for Baidu ERNIE 4.5 0.3B model. Signed-off-by: Weizhao Ouyang <[email protected]>
b5770
fix async_mode bug (#14432)