Releases · chraac/llama.cpp

05 Oct 15:49

ca71fb9

b6692 Latest

Latest

model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206)

* feat: Add granite-docling conversion using trillion pretokenizer

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add granite-docling vocab pre enum

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Use granite-docling pre

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add clip_is_idefics3

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Allow multi-token boundary sequences for image templating

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add tiling support for idefices3 in clip.cpp

This should likely be moved into llava_uhd::get_slice_instructions, but for
now this avoids disrupting the logic there.

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Partial support for full templating for idefics3 in mtmd

There are still errors encoding some of the image chunks, but the token
sequence now matches transformers _almost_ perfectly, except for the double
newline before the global image which shows up as two consecutive newline
tokens instead of a single double-newline token. I think this is happening
because the blocks are tokenized separately then concatenated.

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Fully working image preprocessing for idefics3 w/ resize and slicing

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Parse the preprocessor config's longest side and add it to the mmproj hparams

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Use the longest side instead of size * scale_factor

For Granite Docling, these come out to the same value, but that was just a
conicidence.

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Allow batch encoding and remove clip_is_idefics3

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* refactor: Remove unnecessary conditionals for empty token vectors

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* refactor: Use image_manipulation util

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* add test model

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-05T15:49:26Z
llama-b6692-bin-macos-arm64.zip

sha256:8ebf73fe193253668b67dbefb2f1455773ed81d40567baf3bfd2e855eb8eb31f

10.4 MB 2025-10-05T15:49:33Z
llama-b6692-bin-macos-x64.zip

sha256:2971f05c62cecd88bdcc3765b233ae527bd492be2102a9d7bce907d8b7402bc8

26.8 MB 2025-10-05T15:49:34Z
llama-b6692-bin-ubuntu-vulkan-x64.zip

sha256:ddae468b6c2edd7410c399de4bbc3f6b19a66d26412b9025894cef1b92c3ee62

25.5 MB 2025-10-05T15:49:35Z
llama-b6692-bin-ubuntu-x64.zip

sha256:50d11c0eae4605f0e2fc801f91db4c1c9f544aabded5c2f1eaf502f5a39947d4

12.4 MB 2025-10-05T15:49:37Z
llama-b6692-bin-win-cpu-arm64.zip

sha256:29499bf8bc3c8dcbf2dec23948746fc47ba348e055481839da0325c486789317

10.5 MB 2025-10-05T15:49:37Z
llama-b6692-bin-win-cpu-x64.zip

sha256:dafe23cc3a86d5d6404ce920a557ede10ccb6570c4b62dc167a5a5406b990153

13.6 MB 2025-10-05T15:49:38Z
llama-b6692-bin-win-cuda-12.4-x64.zip

sha256:b3584b1d5ece28f9398b5002d4f9eb30daa1b66a7ea0c0ba3b97714d2a78f5f1

149 MB 2025-10-05T15:49:39Z
llama-b6692-bin-win-hip-radeon-x64.zip

sha256:f9743469ecc47ada78d0bd6651aee477edab03047daf6f071b65ca0c53e0629e

313 MB 2025-10-05T15:49:44Z
llama-b6692-bin-win-opencl-adreno-arm64.zip

sha256:eb87403500edd68ea78a79cadf15f4d28e86c4365e01237168cfb6b470608a45

11 MB 2025-10-05T15:49:52Z
Source code (zip)

2025-10-05T12:57:47Z
Source code (tar.gz)

2025-10-05T12:57:47Z

25 Aug 05:03

github-actions

b6266

8b48538

b6266

Merge branch 'master' of https://github.com/ggml-org/llama.cpp

Assets 15

08 Aug 17:06

github-actions

b6119

cd6983d

b6119

ggml : fix field name when new ggml_backend (#14944)

Assets 15

21 Jul 16:15

github-actions

b5952

9220426

b5952

kleidiai: add support for get_rows (#14676)

* kleidiai: add support for get_rows

* apply fixes based on code review

* apply more fixes based on code review

Assets 15

11 Jul 09:56

github-actions

b5869

576c82e

b5869

vocab : add midm-2.0 model pre-tokenizer (#14626)

Assets 15

30 Jun 06:55

github-actions

b5780

caf5681

b5780

server : support jinja extra template kwargs (Qwen3 enable_thinking f…

Assets 15

09 Jun 07:36

github-actions

b5608

91a8ee6

b5608

add geglu activation function (#14074)

Co-authored-by: dinhhuy <[email protected]>

Assets 15

27 May 03:12

github-actions

b5501

cdf94a1

b5501

server: --offline mode (#13804)

* server: --offline mode (env: LLAMA_OFFLINE)

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>

Assets 18

22 May 04:09

github-actions

b2961

201cc11

b2961

llama : add phi3 128K model support (#7225)

* add phi3 128k support in convert-hf-to-gguf

* add phi3 128k support in cuda

* address build warnings on llama.cpp

* adjust index value in cuda long rope freq factors

* add long rope support in ggml cpu backend

* make freq factors only depend on ctx size

* remove unused rope scaling type 'su' frin gguf converter

* fix flint warnings on convert-hf-to-gguf.py

* set to the short freq factor when context size is small than trained context size

* add one line of comments

* metal : support rope freq_factors

* ggml : update ggml_rope_ext API to support freq. factors

* backends : add dev messages to support rope freq. factors

* minor : style

* tests : update to use new rope API

* backends : fix pragma semicolons

* minor : cleanup

* llama : move rope factors from KV header to tensors

* llama : remove tmp assert

* cuda : fix compile warning

* convert : read/write n_head_kv

* llama : fix uninitialized tensors

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 21

21 May 15:22

github-actions

b2956

11474e7

b2956

examples: cache hf model when --model not provided (#7353)

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

Assets 21

Releases: chraac/llama.cpp

b6692

Uh oh!

b6266

Uh oh!

b6119

Uh oh!

b5952

Uh oh!

b5869

Uh oh!

b5780

Uh oh!

b5608

Uh oh!

b5501

Uh oh!

b2961

Uh oh!

b2956

Uh oh!