Sync master with upstream release b6062 by jan-service-account · Pull Request #191 · janhq/llama.cpp

jan-service-account · 2025-08-02T09:09:10Z

Updates dev branch with latest release (b6062) from ggml-org/llama.cpp

* support hunyuan_v1_dense Signed-off-by: stevenkuang <[email protected]> * update hunyuan_moe to hunyuan_v1_moe Signed-off-by: stevenkuang <[email protected]> * fix rope alpha assert and bos token Signed-off-by: stevenkuang <[email protected]> * add blank line Signed-off-by: stevenkuang <[email protected]> * Revert "update hunyuan_moe to hunyuan_v1_moe" This reverts commit aa973ca. * use hunyuan_dense instead of hunyuan_v1_dense Signed-off-by: stevenkuang <[email protected]> * fix hunyuan_moe chat template Signed-off-by: stevenkuang <[email protected]> * remove leftover code Signed-off-by: stevenkuang <[email protected]> * update hunyuan dense chat template Signed-off-by: stevenkuang <[email protected]> * fix hunyuan dense vocab and chat template Signed-off-by: stevenkuang <[email protected]> --------- Signed-off-by: stevenkuang <[email protected]>

* vendor : update vendored copy of google/minja Signed-off-by: Lennart Austenfeld <[email protected]> * Re-remove trailing whitespace Signed-off-by: Lennart Austenfeld <[email protected]> * Remove another trailing whitespace Signed-off-by: Lennart Austenfeld <[email protected]> --------- Signed-off-by: Lennart Austenfeld <[email protected]>

* vulkan: optimizations for direct convolution - Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill the GPU. The new size should be amenable to using coopmat, too. - Fix shmem bank conflicts. 16B padding should work with coopmat. - Some explicit loop unrolling. - Skip math/stores work for parts of the tile that are OOB. - Apply fastdiv opt. - Disable shuffles for NV. * Three tiles sizes for CONV_2D, and a heuristic to choose * reallow collectives for pre-Turing * make SHMEM_PAD a spec constant * fixes for intel perf - no shmem padding, placeholder shader core count * shader variants with/without unrolling * 0cc4m's fixes for AMD perf Co-authored-by: 0cc4m <[email protected]> --------- Co-authored-by: 0cc4m <[email protected]>

lhez and others added 6 commits August 1, 2025 13:15

opencl: add f16 for add, sub, mul, div (ggml-org#14984)

1c872f7

CUDA: fix MMQ nwarps for AMD with warp_size==32 (ggml-org#15014)

9c35706

server: enable token array inputs for OAI API (ggml-org#15001)

f906275

jan-service-account merged commit 1749cf1 into dev Aug 2, 2025
12 checks passed

jan-service-account deleted the update-dev-from-master-2025-08-02-09-09 branch August 2, 2025 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync master with upstream release b6062#191

Sync master with upstream release b6062#191
jan-service-account merged 6 commits intodevfrom
update-dev-from-master-2025-08-02-09-09

jan-service-account commented Aug 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

jan-service-account commented Aug 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants