Releases: huggingface/optimum-intel
v1.25.2: Patch release
- Fix tokenizer conversion #1414 by @nikita-savelyevv
- Fix and test stateless encoder decoders #1423 by @IlyasMoutawwakil
- Use eager mask all the time #1424 by @IlyasMoutawwakil
Full Changelog: v1.25.1...v1.25.2
Compatible with transformers>=4.36,<=4.53
v1.25.1: Patch release
- Fix gemma3 for older transformers versions and llava next with mistral decoder #1408 by @IlyasMoutawwakil
- Handle deprecation of forced_decoder_ids in transformers generation_config #1402 by @aleksandr-mokrov and @echarlaix
Full Changelog: v1.25.0...v1.25.1
Compatible with transformers>=4.36,<=4.53
v1.25.0: Text-to-Text generation models quantization
🚀 New Features & Enhancements
- Add quantization for text2text-generation models by @nikita-savelyevv in #1359
- Add OpenVINO support for Mamba and Falcon-mamba by @rkazants in #1360
- Add quantization for SegmentAnything model by @nikita-savelyevv in #1384
- Add support for cb4_f8e4m3 quantization mode by @nikita-savelyevv in #1378
- Add quantization statistics path argument by @nikita-savelyevv in #1392
- Add Transformers 4.53 support by @IlyasMoutawwakil in #1377
New Contributors
What's Changed
- Add OpenVINO weight compression tests for llama4 by @nikita-savelyevv in #1369
- Fix IPEX model loading for sentence-transformers v5 by @echarlaix in #1370
- Update OpenVINO documentation with newly supported tasks by @rkazants in #1371
- [Docs] Optimization table on click feedback logic by @nikita-savelyevv in #1372
- Fix attr name typo in model_configs for llava-next compatibility with transformers 4.51.3 by @mitruska in #1375
- [OV] Add quantization for text2text-generation models by @nikita-savelyevv in #1359
- free up disk for slow/full ci by @IlyasMoutawwakil in #1376
- Original model types by @IlyasMoutawwakil in #1329
- Add openvino VLM quantization notebook by @echarlaix in #1382
- Remove notebook redundant quantization configs by @echarlaix in #1383
- [OV] Prepare quantization dataset collection logic to transition to datasets v4.0 by @nikita-savelyevv in #1381
- [OpenVINO] Add support for Mamba and Falcon-mamba by @rkazants in #1360
- Improve VLM quantization notebook structure by @ezelanza in #1385
- [OV] Add quantization for SegmentAnything model by @nikita-savelyevv in #1384
- [OV] Update the reference number of int8 nodes for SANA model by @nikita-savelyevv in #1386
- Add notebook quantization config paragraph by @echarlaix in #1390
- [TTS] Fix second generation for Speech T5 TSS by @rkazants in #1389
- fix auto_model_class for OVModelForVisualCausalLM by @echarlaix in #1391
- Add support for cb4_f8e4m3 quantization mode. by @nikita-savelyevv in #1378
- Add quantization statistics path argument by @nikita-savelyevv in #1392
- Transformers 4.53 support by @IlyasMoutawwakil in #1377
Compatible with transformers>=4.36,<=4.53
Full Changelog: v1.24.0...v1.25.0
v1.24.0: OVPipelineQuantizationConfig
🚀 New Features & Enhancements
Optimum 1.26 compatibility by @IlyasMoutawwakil in #1352
OpenVINO
- Introduce default full quantization configs for clip models by @nikita-savelyevv in #1302
- Introduce OVPipelineQuantizationConfig by @nikita-savelyevv in #1310
- Add int8 PTQ configs for some fill-mask models by @nikita-savelyevv in #1331
- Add transformers v4.52 compatibility by @eaidova in #1319
- Add compression config for Qwen/Qwen2.5-Coder-3B-Instruct by @MaximProshin in #1355
- [OV] Add support for data-free AWQ by @nikita-savelyevv in #1349
- Convert dataclasses to dicts in quantization config before saving by @nikita-savelyevv in #1362
- Remove reshaping for stateful decoders by @echarlaix in #1333
IPEX
- Add transformers v4.52 compatibility by @jiqing-feng in #1317
🔧 Key Fixes & Optimizations
- Raise if converted subcomponent not found by @echarlaix in #1303
- Keep Hybrid Quantization only for diffusion pipelines by @nikita-savelyevv in #1313
- Fix whisper with auto language detection by @eaidova in #1314
- Fix vision embeddings export for maira by @eaidova in #1320
- Fix VLM calibration dataset collection by @nikita-savelyevv in #1321
- Resize large images during VLM calibration data collection by @nikita-savelyevv in #1322
- Resolve logger warnings by @emmanuel-ferdman in #1324
- Fix progress bar during calibration dataset collection by @nikita-savelyevv in #1323
- Fix ESM models export and add it to supported by @eaidova in #1328
- Allow skip trace check for sentence stransformers by @eaidova in #1332
- Fix int value recompile by @jiqing-feng in #1335
- Fix TP tensor dimension dismatch for IPEX models by @kaixuanliu in #1340
- Updated Qwen3-8b compression config by @MaximProshin in #1341
New Contributors
- @kilavvy made their first contribution in #1345
- @maximevtush made their first contribution in #1347
- @leopardracer made their first contribution in #1351
What's Changed
- Dev version by @echarlaix in #1309
- Update number of int8 nodes for Segment Anything model by @nikita-savelyevv in #1311
- [OV][Docs] Keep Hybrid Quantization only for diffusion pipelines by @nikita-savelyevv in #1313
- raise if converted subcomponent not found by @echarlaix in #1303
- [OV] Introduce default full quantization configs for clip models by @nikita-savelyevv in #1302
- fix whisper with auto language detection by @eaidova in #1314
- fix vision embeddings export for maira by @eaidova in #1320
- [OV] Fix VLM calibration dataset collection by @nikita-savelyevv in #1321
- [OV] Resize large images during VLM calibration data collection by @nikita-savelyevv in #1322
- Resolve logger warnings by @emmanuel-ferdman in #1324
- [OV] Fix progress bar during calibration dataset collection by @nikita-savelyevv in #1323
- Limit INC version to fix CI. by @changwangss in #1325
- [OV] Update AWQ test to pass on NNCF develop by @nikita-savelyevv in #1326
- Fix ESM models export and add it to supported by @eaidova in #1328
- Introduce OVPipelineQuantizationConfig by @nikita-savelyevv in #1310
- [OV] Add int8 PTQ configs for some fill-mask models. by @nikita-savelyevv in #1331
- allow skip trace check for sentence stransformers by @eaidova in #1332
- fix int value recompile by @jiqing-feng in #1335
- Add style bot by @echarlaix in #1337
- Fix setup.py to support INC latest version 3.4.1 by @changwangss in #1339
- fix bug when using tp, tensor dimension dismatch by @kaixuanliu in #1340
- fix optimum version by @echarlaix in #1344
- Updated Qwen3-8b compression config by @MaximProshin in #1341
- Fix Typo in Error Message for Sequence Length Validation by @kilavvy in #1345
- Fix Typographical Errors in Documentation String by @maximevtush in #1347
- upgrade windows runner image by @echarlaix in #1350
- Upgrade transformers version to 4.52 for ipex patching by @jiqing-feng in #1317
- Minor Typo Fixes in Comments for Quantized Generation Demo Notebook by @leopardracer in #1351
- fix openvino for compatibility with transformers 4.52 by @eaidova in #1319
- Optimum 2.26 compatibility by @IlyasMoutawwakil in #1352
- [OV] Update reference number of fp8 fake convert nodes by @nikita-savelyevv in #1348
- Compression config for Qwen/Qwen2.5-Coder-3B-Instruct by @MaximProshin in #1355
- Docs: Fix typos in quantized generation demo notebook by @kilavvy in #1356
- update style bot permission and token by @echarlaix in #1357
- [OV] Add support for data-free AWQ by @nikita-savelyevv in #1349
- Add documentation workflow by @echarlaix in #1361
- Fix style by @echarlaix in #1363
- fix by @echarlaix in #1364
- Fix documentation workflow by @echarlaix in #1365
- Convert dataclasses to dicts in quantization config before saving by @nikita-savelyevv in #1362
- Remove reshaping for stateful decoders by @echarlaix in #1333
Compatible with transformers>=4.36,<=4.52
Full Changelog: v1.23.0...v1.24.0
v1.23.1: Patch release
Full Changelog: v1.23.0...v1.23.1
v1.23.0: DeepSeek, Llama 4, LTX-Video
🚀 New Features & Enhancements
OpenVINO
- Add MAIRA-2 support by @eaidova in #1145
- Add support for
nf4_f8e4m3
quantization mode by @nikita-savelyevv in #1148 - Add DeepSeek support by @eaidova in #1155
- Add Qwen2.5-VL support by @eaidova in #1163
- Add LLaVA-Next-Video support by @eaidova in #1183
- Add GOT-OCR2 support by @eaidova in #1202
- Add Gemma 3 support by @eaidova in #1198
- Add SmolVLM and Idefics3 support by @eaidova in #1210
- Add Phi-3-MoE support by @eaidova in #1215
- Add OVSamModel for inference by @eaidova in #1229
- Add Phi-4-multimodal support by @eaidova in #1201
- Add Llama 4 support by @eaidova in #1226
- Add zero-shot-Image-classification support by @eaidova in #1273
- Add PTQ support for OVModelForZeroShotImageClassification by @nikita-savelyevv in #1283
- Add diffuers full int8 quantization Support by @l-bat in #1193
- Add SANA-Sprint support by @eaidova in #1245
- Add PTQ support for OVModelForMaskedLM by @nikita-savelyevv in #1268
- Add LTX-Video support by @eaidova in #1264
- Add Qwen3 and Qwen3-MOE support by @openvino-dev-samples in #1214
- Add SpeechT5 text-to-speech support by OpenVINO by @rkazants in #1230
- Add GLM4 support by @openvino-dev-samples in #1249
- PTQ support for OVModelForFeatureExtraction and OVSentenceTransformer by @nikita-savelyevv in #1257
- Introduce OVCalibrationDatasetBuilder by @nikita-savelyevv in #1232
IPEX
- Add Qwen2 support by @jiqing-feng in #1107
- Enable quantization model support by @jiqing-feng in #1074
- Add support for flash decoding on xpu by @kaixuanliu in #1118
- Add Phi support by @jiqing-feng in #1175
- Enable compilation for patched model with paged attention by @jiqing-feng in #1253
- Add Mistral modeling optimization support for ipex by @kaixuanliu in #1269
Transformers compatibility
- Add compatibility with transformers v4.49 by @echarlaix in #1172
- Add compatibility with transformers v4.50 and v4.51 by @IlyasMoutawwakil in #1242
🔧 Key Fixes & Optimizations
- Fix misplaced configs saving by @eaidova in #1159
- Check if nncf is installed before running quantization from optimum-cli by @nikita-savelyevv in #1154
- Fix automatic-speech-recognition-with-past quantization from CLI by @nikita-savelyevv in #1180
- Propagate OV QuantizationConfig kwargs to nncf calls by @nikita-savelyevv in #1179
- Fix model field names for OVBaseModelForSeq2SeqLM by @nikita-savelyevv in #1184
- Align loading dtype logic for diffusers with other models by @eaidova in #1187
- Fix generation for statically reshaped diffusion pipeline by @eaidova in #1199
- Add
ov_submodels
property toOVBaseModel
by @nikita-savelyevv in #1177 - Fix flux and sana export with diffusers 0.33+ by @eaidova in #1236
- Update pkv precision at save_pretrained call by @nikita-savelyevv in #1235
- Remove ONNX fallback when converting to OpenVINO by @eaidova in #1272
- Fix custom dataset processing for text encoding tasks by @nikita-savelyevv in #1286
- Fix openvino decoder models output by @echarlaix in #1308
What's Changed
- fix export phi3 with --trust-remote-code by @eaidova in #1147
- Skip test_aware_training_quantization test by @nikita-savelyevv in #1149
- Check if nncf is installed before running quantization from optimum-cli by @nikita-savelyevv in #1154
- enable qwen2 model by @jiqing-feng in #1107
- maira2 support by @eaidova in #1145
- Add slow tests for lower transformers version by @echarlaix in #1144
- fix misplaced configs saving by @eaidova in #1159
- Add default int4 config for DeepSeek-R1-Distill-Llama-8B by @nikita-savelyevv in #1158
- Remove unnecessary SD reload from saved dir by @l-bat in #1162
- resolve complicated chat templates during tokenizer saving by @eaidova in #1151
- Trigger tests for maira2 for compatible transformers version by @echarlaix in #1161
- use Tensor.numpy() instead np.array(Tensor) by @eaidova in #1153
- [OV] Add support for
nf4_f8e4m3
quantization mode by @nikita-savelyevv in #1148 - support updated chat template for llava-next by @eaidova in #1166
- avoid extra reshaping to max_model_lenght for unet by @eaidova in #1164
- Enable quant model support by @jiqing-feng in #1074
- [OV] Add default int4 configurations for DeepSeek-R1-Distill-Qwen models by @nikita-savelyevv in #1168
- Deprecate OVTrainer by @nikita-savelyevv in #1167
- Support deeepseek models export by @eaidova in #1155
- add support for flash decoding on xpu by @kaixuanliu in #1118
- deprecate TSModelForCausalLM by @echarlaix in #1173
- transformers 4.49 by @echarlaix in #1172
- Update ipex Ci to torch 2.6 by @jiqing-feng in #1176
- add support qwen2.5vl by @eaidova in #1163
- enable phi by @jiqing-feng in #1175
- Add
ov_submodels
property toOVBaseModel
by @nikita-savelyevv in #1177 - [OV] Fix automatic-speech-recognition-with-past quantization from CLI by @nikita-savelyevv in #1180
- Propagate OV*QuantizationConfig kwargs to nncf calls by @nikita-savelyevv in #1179
- [OV] Add int4 config for Llama-3.1-8b model id aliases by @nikita-savelyevv in #1182
- Fix model field names for OVBaseModelForSeq2SeqLM by @nikita-savelyevv in #1184
- [OV] Enable back phi3_v 4bit compression test by @nikita-savelyevv in #1185
- align loading dtype logic for diffusers with other models by @eaidova in #1187
- attempt to resolve 4.49 compatibility issues and fix input processing… by @eaidova in #1190
- fix logits_to_keep by @jiqing-feng in #1188
- warm up do not work for compiled model by @jiqing-feng in #1189
- Add default int4 configs for Phi-4-mini-instruct and Qwen2.5-7B-Instruct by @nikita-savelyevv in #1194
- add support llava-next-video by @eaidova in #1183
- upgrade transformers to 4.49 for patching models by @jiqing-feng in #1196
- add support got-ocr2 by @eaidova in #1202
- fix generation for statically reshaped diffusion pipeline...
v1.22.0: Qwen2-VL, Granite, Sana, Sentence Transformers
OpenVINO
- Add quantization of Whisper pipeline by @nikita-savelyevv in #1040
- Add Qwen2-VL support by @eaidova in #1042
- Add AWQ models support by @mvafin in #1049
- Update default OV configuration by @KodiaqQ in #1057
- Introduce
--quant-mode
cli argument enabling full quantization via optimum-cli by @nikita-savelyevv in #1061 - Merge decoder and decoder with past to stateful for seq2seq models by @eaidova in #1078
- Add transformers 4.47 support by @IlyasMoutawwakil in #1088
- Add GLM-Edge models support by @eaidova in #1089
- Add Granite and GraniteMoe models support by @eaidova in #1099
- Add fp8 implementation by @KodiaqQ in #1100
- Add Flux Fill inpainting pipeline support by @eaidova in #1095
- Add Sana support by @eaidova in #1106
- Add v4.48 transformers support by @IlyasMoutawwakil in #1136
IPEX
- Add support to sentence transformers models by @echarlaix in #1034
from optimum.intel import IPEXSentenceTransformer
model = IPEXSentenceTransformer.from_pretrained(model_id)
- Add support to text-to-text task by @jiqing-feng in #1054
from optimum.intel import IPEXModelForSeq2SeqLM
model = IPEXModelForSeq2SeqLM.from_pretrained(model_id)
- Enable Flash Attention by @jiqing-feng in #1065
Compatible with transformers>=4.36,<=4.48
Full Changelog: v1.21.0...v1.22.0
v1.21.0: SD3, Flux, MiniCPM, NanoLlava, VLM Quantization, XPU, PagedAttention
What's Changed
OpenVINO
Diffusers
VLMs Modeling
- MiniCPMv support by @eaidova in #972
- NanoLlava support by @eaidova in #969
- Phi3v support by @eaidova in #977
NNCF
- Quantization support for CausalVisualLMs by @nikita-savelyevv in #951
- NF4 data type support for OV weight compression by @l-bat in #988
- NNCF 2.14 new features support by @nikita-savelyevv in #997
IPEX
INC
- Layer-wise quantization support by @changwangss in #1040
New Contributors
- @emmanuel-ferdman made their first contribution in #974
- @mvafin made their first contribution in #1033
Compatible with transformers>=4.36,<=4.46
Full Changelog: v1.20.0...v1.21.0
v1.20.1: Patch release
- Fix lora unscaling in diffusion pipelines by @eaidova in #937
- Fix compatibility with diffusers < 0.25.0 by @eaidova in #952
- Allow to use SDPA in clip models by @eaidova in #941
- Updated OVPipelinePart to have separate ov_config by @e-ddykim in #957
- Symbol use in optimum: fix misprint by @jane-intel in #948
- Fix temporary directory saving by @eaidova in #959
- Disable warning about tokenizers version for ov tokenizers >= 2024.5 by @eaidova in #962
- Restore original model_index.json after save_pretrained call by @eaidova in #961
- Add v4.46 transformers support by @echarlaix in #960
v1.20.0: multi-modal and OpenCLIP models support, transformers v4.45
OpenVINO
Multi-modal models support
Adding OVModelForVisionCausalLM
by @eaidova in #883
OpenCLIP models support
Adding OpenCLIP models support by @sbalandi in #857
from optimum.intel import OVModelCLIPVisual, OVModelCLIPText
visual_model = OVModelCLIPVisual.from_pretrained(model_name_or_path)
text_model = OVModelCLIPText.from_pretrained(model_name_or_path)
image = processor(image).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])
image_features = visual_model(image).image_features
text_features = text_model(text).text_features
Diffusion pipeline
Adding OVDiffusionPipeline
to simplify diffusers model loading by @IlyasMoutawwakil in #889
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
- pipeline = OVStableDiffusionXLPipeline.from_pretrained(model_id)
+ pipeline = OVDiffusionPipeline.from_pretrained(model_id)
image = pipeline("sailing ship in storm by Leonardo da Vinci").images[0]
NNCF GPTQ support
GPTQ support by @nikita-savelyevv in #912
Transformers v4.45
Transformers v4.45 support by @echarlaix in #902
Subfolder
Remove the restriction for the model's config to be in the model's subfolder by @tomaarsen in #933
New Contributors
- @jane-intel made their first contribution in #696
- @andreyanufr made their first contribution in #903
- @MaximProshin made their first contribution in #905
- @tomaarsen made their first contribution in #931
Compatible with transformers>=4.36,<=4.45
Full Changelog: v1.19.0...v1.20.0