Releases · huggingface/optimum-intel

13 Aug 09:35

echarlaix

v1.25.2

343978c

v1.25.2: Patch release Latest

Latest

Fix tokenizer conversion #1414 by @nikita-savelyevv
Fix and test stateless encoder decoders #1423 by @IlyasMoutawwakil
Use eager mask all the time #1424 by @IlyasMoutawwakil

Full Changelog: v1.25.1...v1.25.2

Compatible with transformers>=4.36,<=4.53

Contributors

nikita-savelyevv and IlyasMoutawwakil

Assets 2

07 Aug 06:16

IlyasMoutawwakil

v1.25.1

e02f89f

v1.25.1: Patch release

Fix gemma3 for older transformers versions and llava next with mistral decoder #1408 by @IlyasMoutawwakil
Handle deprecation of forced_decoder_ids in transformers generation_config #1402 by @aleksandr-mokrov and @echarlaix

Full Changelog: v1.25.0...v1.25.1

Compatible with transformers>=4.36,<=4.53

Contributors

IlyasMoutawwakil, aleksandr-mokrov, and echarlaix

Assets 2

04 Aug 16:42

echarlaix

v1.25.0

6e47856

v1.25.0: Text-to-Text generation models quantization

🚀 New Features & Enhancements

Add quantization for text2text-generation models by @nikita-savelyevv in #1359
Add OpenVINO support for Mamba and Falcon-mamba by @rkazants in #1360
Add quantization for SegmentAnything model by @nikita-savelyevv in #1384
Add support for cb4_f8e4m3 quantization mode by @nikita-savelyevv in #1378
Add quantization statistics path argument by @nikita-savelyevv in #1392
Add Transformers 4.53 support by @IlyasMoutawwakil in #1377

New Contributors

@mitruska made their first contribution in #1375
@ezelanza made their first contribution in #1385

What's Changed

Add OpenVINO weight compression tests for llama4 by @nikita-savelyevv in #1369
Fix IPEX model loading for sentence-transformers v5 by @echarlaix in #1370
Update OpenVINO documentation with newly supported tasks by @rkazants in #1371
[Docs] Optimization table on click feedback logic by @nikita-savelyevv in #1372
Fix attr name typo in model_configs for llava-next compatibility with transformers 4.51.3 by @mitruska in #1375
[OV] Add quantization for text2text-generation models by @nikita-savelyevv in #1359
free up disk for slow/full ci by @IlyasMoutawwakil in #1376
Original model types by @IlyasMoutawwakil in #1329
Add openvino VLM quantization notebook by @echarlaix in #1382
Remove notebook redundant quantization configs by @echarlaix in #1383
[OV] Prepare quantization dataset collection logic to transition to datasets v4.0 by @nikita-savelyevv in #1381
[OpenVINO] Add support for Mamba and Falcon-mamba by @rkazants in #1360
Improve VLM quantization notebook structure by @ezelanza in #1385
[OV] Add quantization for SegmentAnything model by @nikita-savelyevv in #1384
[OV] Update the reference number of int8 nodes for SANA model by @nikita-savelyevv in #1386
Add notebook quantization config paragraph by @echarlaix in #1390
[TTS] Fix second generation for Speech T5 TSS by @rkazants in #1389
fix auto_model_class for OVModelForVisualCausalLM by @echarlaix in #1391
Add support for cb4_f8e4m3 quantization mode. by @nikita-savelyevv in #1378
Add quantization statistics path argument by @nikita-savelyevv in #1392
Transformers 4.53 support by @IlyasMoutawwakil in #1377

Compatible with transformers>=4.36,<=4.53

Full Changelog: v1.24.0...v1.25.0

Contributors

mitruska, nikita-savelyevv, and 4 other contributors

Assets 2

01 Jul 12:47

echarlaix

v1.24.0

575bd47

v1.24.0: OVPipelineQuantizationConfig

🚀 New Features & Enhancements

Optimum 1.26 compatibility by @IlyasMoutawwakil in #1352

OpenVINO

Introduce default full quantization configs for clip models by @nikita-savelyevv in #1302
Introduce OVPipelineQuantizationConfig by @nikita-savelyevv in #1310
Add int8 PTQ configs for some fill-mask models by @nikita-savelyevv in #1331
Add transformers v4.52 compatibility by @eaidova in #1319
Add compression config for Qwen/Qwen2.5-Coder-3B-Instruct by @MaximProshin in #1355
[OV] Add support for data-free AWQ by @nikita-savelyevv in #1349
Convert dataclasses to dicts in quantization config before saving by @nikita-savelyevv in #1362
Remove reshaping for stateful decoders by @echarlaix in #1333

IPEX

Add transformers v4.52 compatibility by @jiqing-feng in #1317

🔧 Key Fixes & Optimizations

Raise if converted subcomponent not found by @echarlaix in #1303
Keep Hybrid Quantization only for diffusion pipelines by @nikita-savelyevv in #1313
Fix whisper with auto language detection by @eaidova in #1314
Fix vision embeddings export for maira by @eaidova in #1320
Fix VLM calibration dataset collection by @nikita-savelyevv in #1321
Resize large images during VLM calibration data collection by @nikita-savelyevv in #1322
Resolve logger warnings by @emmanuel-ferdman in #1324
Fix progress bar during calibration dataset collection by @nikita-savelyevv in #1323
Fix ESM models export and add it to supported by @eaidova in #1328
Allow skip trace check for sentence stransformers by @eaidova in #1332
Fix int value recompile by @jiqing-feng in #1335
Fix TP tensor dimension dismatch for IPEX models by @kaixuanliu in #1340
Updated Qwen3-8b compression config by @MaximProshin in #1341

New Contributors

@kilavvy made their first contribution in #1345
@maximevtush made their first contribution in #1347
@leopardracer made their first contribution in #1351

What's Changed

Dev version by @echarlaix in #1309
Update number of int8 nodes for Segment Anything model by @nikita-savelyevv in #1311
[OV][Docs] Keep Hybrid Quantization only for diffusion pipelines by @nikita-savelyevv in #1313
raise if converted subcomponent not found by @echarlaix in #1303
[OV] Introduce default full quantization configs for clip models by @nikita-savelyevv in #1302
fix whisper with auto language detection by @eaidova in #1314
fix vision embeddings export for maira by @eaidova in #1320
[OV] Fix VLM calibration dataset collection by @nikita-savelyevv in #1321
[OV] Resize large images during VLM calibration data collection by @nikita-savelyevv in #1322
Resolve logger warnings by @emmanuel-ferdman in #1324
[OV] Fix progress bar during calibration dataset collection by @nikita-savelyevv in #1323
Limit INC version to fix CI. by @changwangss in #1325
[OV] Update AWQ test to pass on NNCF develop by @nikita-savelyevv in #1326
Fix ESM models export and add it to supported by @eaidova in #1328
Introduce OVPipelineQuantizationConfig by @nikita-savelyevv in #1310
[OV] Add int8 PTQ configs for some fill-mask models. by @nikita-savelyevv in #1331
allow skip trace check for sentence stransformers by @eaidova in #1332
fix int value recompile by @jiqing-feng in #1335
Add style bot by @echarlaix in #1337
Fix setup.py to support INC latest version 3.4.1 by @changwangss in #1339
fix bug when using tp, tensor dimension dismatch by @kaixuanliu in #1340
fix optimum version by @echarlaix in #1344
Updated Qwen3-8b compression config by @MaximProshin in #1341
Fix Typo in Error Message for Sequence Length Validation by @kilavvy in #1345
Fix Typographical Errors in Documentation String by @maximevtush in #1347
upgrade windows runner image by @echarlaix in #1350
Upgrade transformers version to 4.52 for ipex patching by @jiqing-feng in #1317
Minor Typo Fixes in Comments for Quantized Generation Demo Notebook by @leopardracer in #1351
fix openvino for compatibility with transformers 4.52 by @eaidova in #1319
Optimum 2.26 compatibility by @IlyasMoutawwakil in #1352
[OV] Update reference number of fp8 fake convert nodes by @nikita-savelyevv in #1348
Compression config for Qwen/Qwen2.5-Coder-3B-Instruct by @MaximProshin in #1355
Docs: Fix typos in quantized generation demo notebook by @kilavvy in #1356
update style bot permission and token by @echarlaix in #1357
[OV] Add support for data-free AWQ by @nikita-savelyevv in #1349
Add documentation workflow by @echarlaix in #1361
Fix style by @echarlaix in #1363
fix by @echarlaix in #1364
Fix documentation workflow by @echarlaix in #1365
Convert dataclasses to dicts in quantization config before saving by @nikita-savelyevv in #1362
Remove reshaping for stateful decoders by @echarlaix in #1333

Compatible with transformers>=4.36,<=4.52

Full Changelog: v1.23.0...v1.24.0

Contributors

kaixuanliu, MaximProshin, and 10 other contributors

Assets 2

13 Jun 11:30

echarlaix

v1.23.1

bad0063

v1.23.1: Patch release

Full Changelog: v1.23.0...v1.23.1

Assets 2

15 May 13:31

echarlaix

v1.23.0

add48f0

v1.23.0: DeepSeek, Llama 4, LTX-Video

🚀 New Features & Enhancements

OpenVINO

Add MAIRA-2 support by @eaidova in #1145
Add support for nf4_f8e4m3 quantization mode by @nikita-savelyevv in #1148
Add DeepSeek support by @eaidova in #1155
Add Qwen2.5-VL support by @eaidova in #1163
Add LLaVA-Next-Video support by @eaidova in #1183
Add GOT-OCR2 support by @eaidova in #1202
Add Gemma 3 support by @eaidova in #1198
Add SmolVLM and Idefics3 support by @eaidova in #1210
Add Phi-3-MoE support by @eaidova in #1215
Add OVSamModel for inference by @eaidova in #1229
Add Phi-4-multimodal support by @eaidova in #1201
Add Llama 4 support by @eaidova in #1226
Add zero-shot-Image-classification support by @eaidova in #1273
Add PTQ support for OVModelForZeroShotImageClassification by @nikita-savelyevv in #1283
Add diffuers full int8 quantization Support by @l-bat in #1193
Add SANA-Sprint support by @eaidova in #1245
Add PTQ support for OVModelForMaskedLM by @nikita-savelyevv in #1268
Add LTX-Video support by @eaidova in #1264
Add Qwen3 and Qwen3-MOE support by @openvino-dev-samples in #1214
Add SpeechT5 text-to-speech support by OpenVINO by @rkazants in #1230
Add GLM4 support by @openvino-dev-samples in #1249
PTQ support for OVModelForFeatureExtraction and OVSentenceTransformer by @nikita-savelyevv in #1257
Introduce OVCalibrationDatasetBuilder by @nikita-savelyevv in #1232

IPEX

Add Qwen2 support by @jiqing-feng in #1107
Enable quantization model support by @jiqing-feng in #1074
Add support for flash decoding on xpu by @kaixuanliu in #1118
Add Phi support by @jiqing-feng in #1175
Enable compilation for patched model with paged attention by @jiqing-feng in #1253
Add Mistral modeling optimization support for ipex by @kaixuanliu in #1269

Transformers compatibility

Add compatibility with transformers v4.49 by @echarlaix in #1172
Add compatibility with transformers v4.50 and v4.51 by @IlyasMoutawwakil in #1242

🔧 Key Fixes & Optimizations

Fix misplaced configs saving by @eaidova in #1159
Check if nncf is installed before running quantization from optimum-cli by @nikita-savelyevv in #1154
Fix automatic-speech-recognition-with-past quantization from CLI by @nikita-savelyevv in #1180
Propagate OV QuantizationConfig kwargs to nncf calls by @nikita-savelyevv in #1179
Fix model field names for OVBaseModelForSeq2SeqLM by @nikita-savelyevv in #1184
Align loading dtype logic for diffusers with other models by @eaidova in #1187
Fix generation for statically reshaped diffusion pipeline by @eaidova in #1199
Add ov_submodels property to OVBaseModel by @nikita-savelyevv in #1177
Fix flux and sana export with diffusers 0.33+ by @eaidova in #1236
Update pkv precision at save_pretrained call by @nikita-savelyevv in #1235
Remove ONNX fallback when converting to OpenVINO by @eaidova in #1272
Fix custom dataset processing for text encoding tasks by @nikita-savelyevv in #1286
Fix openvino decoder models output by @echarlaix in #1308

What's Changed

fix export phi3 with --trust-remote-code by @eaidova in #1147
Skip test_aware_training_quantization test by @nikita-savelyevv in #1149
Check if nncf is installed before running quantization from optimum-cli by @nikita-savelyevv in #1154
enable qwen2 model by @jiqing-feng in #1107
maira2 support by @eaidova in #1145
Add slow tests for lower transformers version by @echarlaix in #1144
fix misplaced configs saving by @eaidova in #1159
Add default int4 config for DeepSeek-R1-Distill-Llama-8B by @nikita-savelyevv in #1158
Remove unnecessary SD reload from saved dir by @l-bat in #1162
resolve complicated chat templates during tokenizer saving by @eaidova in #1151
Trigger tests for maira2 for compatible transformers version by @echarlaix in #1161
use Tensor.numpy() instead np.array(Tensor) by @eaidova in #1153
[OV] Add support for nf4_f8e4m3 quantization mode by @nikita-savelyevv in #1148
support updated chat template for llava-next by @eaidova in #1166
avoid extra reshaping to max_model_lenght for unet by @eaidova in #1164
Enable quant model support by @jiqing-feng in #1074
[OV] Add default int4 configurations for DeepSeek-R1-Distill-Qwen models by @nikita-savelyevv in #1168
Deprecate OVTrainer by @nikita-savelyevv in #1167
Support deeepseek models export by @eaidova in #1155
add support for flash decoding on xpu by @kaixuanliu in #1118
deprecate TSModelForCausalLM by @echarlaix in #1173
transformers 4.49 by @echarlaix in #1172
Update ipex Ci to torch 2.6 by @jiqing-feng in #1176
add support qwen2.5vl by @eaidova in #1163
enable phi by @jiqing-feng in #1175
Add ov_submodels property to OVBaseModel by @nikita-savelyevv in #1177
[OV] Fix automatic-speech-recognition-with-past quantization from CLI by @nikita-savelyevv in #1180
Propagate OV*QuantizationConfig kwargs to nncf calls by @nikita-savelyevv in #1179
[OV] Add int4 config for Llama-3.1-8b model id aliases by @nikita-savelyevv in #1182
Fix model field names for OVBaseModelForSeq2SeqLM by @nikita-savelyevv in #1184
[OV] Enable back phi3_v 4bit compression test by @nikita-savelyevv in #1185
align loading dtype logic for diffusers with other models by @eaidova in #1187
attempt to resolve 4.49 compatibility issues and fix input processing… by @eaidova in #1190
fix logits_to_keep by @jiqing-feng in #1188
warm up do not work for compiled model by @jiqing-feng in #1189
Add default int4 configs for Phi-4-mini-instruct and Qwen2.5-7B-Instruct by @nikita-savelyevv in #1194
add support llava-next-video by @eaidova in #1183
upgrade transformers to 4.49 for patching models by @jiqing-feng in #1196
add support got-ocr2 by @eaidova in #1202
fix generation for statically reshaped diffusion pipeline...

Contributors

Wovchena, kaixuanliu, and 12 other contributors

Assets 2

06 Feb 23:49

echarlaix

v1.22.0

ba36581

v1.22.0: Qwen2-VL, Granite, Sana, Sentence Transformers

OpenVINO

Add quantization of Whisper pipeline by @nikita-savelyevv in #1040
Add Qwen2-VL support by @eaidova in #1042
Add AWQ models support by @mvafin in #1049
Update default OV configuration by @KodiaqQ in #1057
Introduce --quant-mode cli argument enabling full quantization via optimum-cli by @nikita-savelyevv in #1061
Merge decoder and decoder with past to stateful for seq2seq models by @eaidova in #1078
Add transformers 4.47 support by @IlyasMoutawwakil in #1088
Add GLM-Edge models support by @eaidova in #1089
Add Granite and GraniteMoe models support by @eaidova in #1099
Add fp8 implementation by @KodiaqQ in #1100
Add Flux Fill inpainting pipeline support by @eaidova in #1095
Add Sana support by @eaidova in #1106
Add v4.48 transformers support by @IlyasMoutawwakil in #1136

IPEX

Add support to sentence transformers models by @echarlaix in #1034

from optimum.intel import IPEXSentenceTransformer

model = IPEXSentenceTransformer.from_pretrained(model_id)

Add support to text-to-text task by @jiqing-feng in #1054

from optimum.intel import IPEXModelForSeq2SeqLM

model = IPEXModelForSeq2SeqLM.from_pretrained(model_id)

Enable Flash Attention by @jiqing-feng in #1065

Compatible with transformers>=4.36,<=4.48

Full Changelog: v1.21.0...v1.22.0

Contributors

mvafin, nikita-savelyevv, and 4 other contributors

Assets 2

06 Dec 12:53

IlyasMoutawwakil

v1.21.0

2f15252

v1.21.0: SD3, Flux, MiniCPM, NanoLlava, VLM Quantization, XPU, PagedAttention

What's Changed

OpenVINO

Diffusers

SD3 and Flux pipelines support by @eaidova in #916

VLMs Modeling

MiniCPMv support by @eaidova in #972
NanoLlava support by @eaidova in #969
Phi3v support by @eaidova in #977

NNCF

Quantization support for CausalVisualLMs by @nikita-savelyevv in #951
NF4 data type support for OV weight compression by @l-bat in #988
NNCF 2.14 new features support by @nikita-savelyevv in #997

IPEX

Unified XPU/CPU modeling with custom PagedAttention cache for LLMs by @sywangyi in #1009

INC

Layer-wise quantization support by @changwangss in #1040

New Contributors

@emmanuel-ferdman made their first contribution in #974
@mvafin made their first contribution in #1033

Compatible with transformers>=4.36,<=4.46

Full Changelog: v1.20.0...v1.21.0

Contributors

mvafin, l-bat, and 5 other contributors

Assets 2

30 Oct 14:08

echarlaix

v1.20.1

00e4715

v1.20.1: Patch release

Fix lora unscaling in diffusion pipelines by @eaidova in #937
Fix compatibility with diffusers < 0.25.0 by @eaidova in #952
Allow to use SDPA in clip models by @eaidova in #941
Updated OVPipelinePart to have separate ov_config by @e-ddykim in #957
Symbol use in optimum: fix misprint by @jane-intel in #948
Fix temporary directory saving by @eaidova in #959
Disable warning about tokenizers version for ov tokenizers >= 2024.5 by @eaidova in #962
Restore original model_index.json after save_pretrained call by @eaidova in #961
Add v4.46 transformers support by @echarlaix in #960

Contributors

eaidova, e-nugmanova, and 2 other contributors

Assets 2

10 Oct 17:01

echarlaix

v1.20.0

8fc7c28

v1.20.0: multi-modal and OpenCLIP models support, transformers v4.45

OpenVINO

Multi-modal models support

Adding OVModelForVisionCausalLM by @eaidova in #883

OpenCLIP models support

Adding OpenCLIP models support by @sbalandi in #857

from optimum.intel import OVModelCLIPVisual, OVModelCLIPText

visual_model = OVModelCLIPVisual.from_pretrained(model_name_or_path)
text_model  = OVModelCLIPText.from_pretrained(model_name_or_path)
image = processor(image).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])
image_features = visual_model(image).image_features
text_features = text_model(text).text_features

Diffusion pipeline

Adding OVDiffusionPipeline to simplify diffusers model loading by @IlyasMoutawwakil in #889

  model_id = "stabilityai/stable-diffusion-xl-base-1.0"
- pipeline = OVStableDiffusionXLPipeline.from_pretrained(model_id)
+ pipeline = OVDiffusionPipeline.from_pretrained(model_id)
  image = pipeline("sailing ship in storm by Leonardo da Vinci").images[0]

NNCF GPTQ support

GPTQ support by @nikita-savelyevv in #912

Transformers v4.45

Transformers v4.45 support by @echarlaix in #902

Subfolder

Remove the restriction for the model's config to be in the model's subfolder by @tomaarsen in #933

New Contributors

@jane-intel made their first contribution in #696
@andreyanufr made their first contribution in #903
@MaximProshin made their first contribution in #905
@tomaarsen made their first contribution in #931

Compatible with transformers>=4.36,<=4.45

Full Changelog: v1.19.0...v1.20.0

Contributors

MaximProshin, nikita-savelyevv, and 6 other contributors

Assets 2

Releases: huggingface/optimum-intel

v1.25.2: Patch release

Contributors

Uh oh!

v1.25.1: Patch release

Contributors

Uh oh!

v1.25.0: Text-to-Text generation models quantization

🚀 New Features & Enhancements

New Contributors

What's Changed

Contributors

Uh oh!

v1.24.0: OVPipelineQuantizationConfig

🚀 New Features & Enhancements

OpenVINO

IPEX

🔧 Key Fixes & Optimizations

New Contributors

What's Changed

Contributors

Uh oh!

v1.23.1: Patch release

Uh oh!

v1.23.0: DeepSeek, Llama 4, LTX-Video

🚀 New Features & Enhancements

OpenVINO

IPEX

Transformers compatibility

🔧 Key Fixes & Optimizations

What's Changed

Contributors

Uh oh!

v1.22.0: Qwen2-VL, Granite, Sana, Sentence Transformers

OpenVINO

IPEX

Contributors

Uh oh!

v1.21.0: SD3, Flux, MiniCPM, NanoLlava, VLM Quantization, XPU, PagedAttention

What's Changed

OpenVINO

Diffusers

VLMs Modeling

NNCF

IPEX

INC

New Contributors

Contributors

Uh oh!

v1.20.1: Patch release

Contributors

Uh oh!

v1.20.0: multi-modal and OpenCLIP models support, transformers v4.45

OpenVINO

Multi-modal models support

OpenCLIP models support

Diffusion pipeline

NNCF GPTQ support

Transformers v4.45

Subfolder

New Contributors

Contributors

Uh oh!