Skip to content

Releases: chraac/llama.cpp

b6692

05 Oct 15:49
ca71fb9

Choose a tag to compare

model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206)

* feat: Add granite-docling conversion using trillion pretokenizer

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add granite-docling vocab pre enum

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Use granite-docling pre

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add clip_is_idefics3

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Allow multi-token boundary sequences for image templating

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Add tiling support for idefices3 in clip.cpp

This should likely be moved into llava_uhd::get_slice_instructions, but for
now this avoids disrupting the logic there.

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Partial support for full templating for idefics3 in mtmd

There are still errors encoding some of the image chunks, but the token
sequence now matches transformers _almost_ perfectly, except for the double
newline before the global image which shows up as two consecutive newline
tokens instead of a single double-newline token. I think this is happening
because the blocks are tokenized separately then concatenated.

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Fully working image preprocessing for idefics3 w/ resize and slicing

Branch: gabe-l-hart/GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* feat: Parse the preprocessor config's longest side and add it to the mmproj hparams

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Use the longest side instead of size * scale_factor

For Granite Docling, these come out to the same value, but that was just a
conicidence.

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* fix: Allow batch encoding and remove clip_is_idefics3

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* refactor: Remove unnecessary conditionals for empty token vectors

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* refactor: Use image_manipulation util

Branch: GraniteDocling

Signed-off-by: Gabe Goodhart <[email protected]>

* add test model

---------

Signed-off-by: Gabe Goodhart <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>

b6266

25 Aug 05:03

Choose a tag to compare

Merge branch 'master' of https://github.com/ggml-org/llama.cpp

b6119

08 Aug 17:06
cd6983d

Choose a tag to compare

ggml : fix field name when new ggml_backend (#14944)

b5952

21 Jul 16:15
9220426

Choose a tag to compare

kleidiai: add support for get_rows (#14676)

* kleidiai: add support for get_rows

* apply fixes based on code review

* apply more fixes based on code review

b5869

11 Jul 09:56
576c82e

Choose a tag to compare

vocab : add midm-2.0 model pre-tokenizer (#14626)

b5780

30 Jun 06:55
caf5681

Choose a tag to compare

server : support jinja extra template kwargs (Qwen3 enable_thinking f…

b5608

09 Jun 07:36
91a8ee6

Choose a tag to compare

add geglu activation function (#14074)

Co-authored-by: dinhhuy <[email protected]>

b5501

27 May 03:12
cdf94a1

Choose a tag to compare

server: --offline mode (#13804)

* server: --offline mode (env: LLAMA_OFFLINE)

---------

Co-authored-by: Xuan-Son Nguyen <[email protected]>

b2961

22 May 04:09
201cc11

Choose a tag to compare

llama : add phi3 128K model support (#7225)

* add phi3 128k support in convert-hf-to-gguf

* add phi3 128k support in cuda

* address build warnings on llama.cpp

* adjust index value in cuda long rope freq factors

* add long rope support in ggml cpu backend

* make freq factors only depend on ctx size

* remove unused rope scaling type 'su' frin gguf converter

* fix flint warnings on convert-hf-to-gguf.py

* set to the short freq factor when context size is small than trained context size

* add one line of comments

* metal : support rope freq_factors

* ggml : update ggml_rope_ext API to support freq. factors

* backends : add dev messages to support rope freq. factors

* minor : style

* tests : update to use new rope API

* backends : fix pragma semicolons

* minor : cleanup

* llama : move rope factors from KV header to tensors

* llama : remove tmp assert

* cuda : fix compile warning

* convert : read/write n_head_kv

* llama : fix uninitialized tensors

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b2956

21 May 15:22
11474e7

Choose a tag to compare

examples: cache hf model when --model not provided (#7353)

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided

* examples: cache hf model when --model not provided