v0.16.0
What's new in 0.16.0 (2024-10-18)
These are the changes in inference v0.16.0.
New features
- FEAT: Adding support for awq/gptq vLLM inference to VisionModel such as Qwen2-VL by @cyhasuka in #2445
- FEAT: Dynamic batching for the state-of-the-art FLUX.1
text_to_image
interface by @ChengjieLi28 in #2380 - FEAT: added MLX for qwen2.5-instruct by @qinxuye in #2444
Enhancements
- ENH: Speed up cli interaction by @frostyplanet in #2443
- REF: Enable continuous batching for LLM with transformers engine by default by @ChengjieLi28 in #2437
Documentation
New Contributors
Full Changelog: v0.15.4...v0.16.0