v3.8.0
3.8.0 (2025-05-17)
Features
- save and restore a context sequence state (#460) (f2cb873) (documentation: Saving and restoring a context sequence evaluation state)
- stream function call parameters (#460) (f2cb873) (documentation: API:
LLamaChatPromptOptions["onFunctionCallParamsChunk"]
) - configure Hugging Face remote endpoint for resolving URIs (#460) (f2cb873) (documentation: API:
ResolveModelFileOptions["endpoints"]
) - Qwen 3 support (#460) (f2cb873)
QwenChatWrapper
: support discouraging the generation of thoughts (#460) (f2cb873) (documentation: API:QwenChatWrapper
constructor >thoughts
option)getLlama
:dryRun
option (#460) (f2cb873) (documentation: API:LlamaOptions["dryRun"]
)getLlamaGpuTypes
function (#460) (f2cb873) (documentation: API:getLlamaGpuTypes
)
Bug Fixes
- adapt to breaking
llama.cpp
changes (#460) (f2cb873) - capture multi-token segment separators (#460) (f2cb873)
- race condition when reading extremely long gguf metadata (#460) (f2cb873)
- adapt memory estimation to newly added model architectures (#460) (f2cb873)
- skip binary testing on certain problematic conditions (#460) (f2cb873)
- improve GPU backend loading error description (#460) (f2cb873)
Shipped with llama.cpp
release b5414
To use the latest
llama.cpp
release available, runnpx -n node-llama-cpp source download --release latest
. (learn more)