- Llama.cpp: Add
model_path
kwarg to allow loading local GGUF models (thanks @lawrenceakka!)
Note
This is technically a minor breaking change, as the position of arguments has changed. I recommend using keyword arguments to load any models.
- Hugging Face: Do not set
max_length
generation parameter ifmax_new_tokens
is set to avoid a verbose warning - OpenAI: Add default context lengths for o-series models, GPT-4.1, add warning for models without default context lengths