Release v1.18.0: SynapseAI v1.21, Accelerate, CogVideoX, Llava-onevision · huggingface/optimum-habana

SynapseAI v1.21

This release has been tested on and validated for SynapseAI v1.21.

Accelerate

Gaudi is now natively supported in Accelerate, checkout the doc for more information.

Update GaudiAccelerator #1876 @IlyasMoutawwakil
Fix lost modules in regional compilation #1885 @xinyu-intel
fix fsdp and get rid of GaudiPartialState #1942 @IlyasMoutawwakil
Restore dynamic compilation setting and Fix compile_regions Call #1973 @yafshar
Hot fix regional compilation #2005 @IlyasMoutawwakil
Fix fp8 #2010 @IlyasMoutawwakil

Diffusers

fea(diffusers): Upgraded to version 0.32.0 #1939 @imangohari1
fea(): Diffuser upgrade to 0.33.1 #1981 @imangohari1

CogVideoX

Add cogvideox support for gaudi #1600 @nc-BobLee

GLM4V

Add GLM4V #1668 @mengker33

Siglip and Llava-onevision

Add support for Siglip and Llava Onevision #1883 @emascarenhas

Model optimizations

Optimize memory utilization by keeping logits in BF16 #1859 @kalyank007
Integrate DistributedAttention for Qwen2 #1860 @Jianhong-Zhang
[Llama-vision] Add support for Fused RMS Norm #1892 @ANSHUMAN87
Enable torch compile for llama 3.2 vision #1873 @jaygala223
Flag to enable leaf promotion to avoid graph breaks in MLP for compile #1880 @bhargaveede
Adding Deepspeed config for Llama3 Fine Tuning (#165) #1881 @bhargaveede
Add trim_logits support in deepseekV3 #1933 @jthakurH
Add flag to enable compiled_autograd with Deepspeed for training #1785 @vivekgoe
chatglm: Fix a bug when attention mask is None #1896 @mengker33
Optimized DeepSeek-V2 attention prefill with MHA #1791 @gyou2021
[Llama-Vision] Trim logits #1894 @ANSHUMAN87
Align cross_attention_mask for Llama 3.2 90B to avoid partial writes and graph retracing #1917 @kalyank007
Add FSDP config for Granite model #1897 @kplau1128
[Llama-Vision] Add support for bucketing #1895 @ANSHUMAN87
Add Moonlight Support #1868 @jinyouzhi
Add support for expert parallelism with mixtral #1908 @kwisniewski98
Fix issue with in-place operation with requires grad with modeling_qwen2_vl.py #1970 @emascarenhas
Adjust VideoLlavaProcessor to avoid performance regression on gaudi3 #1969 @kaixuanliu
Speed up FLUX training over 2x with Gaudi optimized attention #1963 @dsocek
[llama-vision] Remove token_idx_cpu parameter #2018 @ugolowic

Other

Makefile improvements #1811 @jasi306
[DeepSeek-V3] README update #1911 @ANSHUMAN87
Skipping falcon rope scaling test #1916 @karol-brejna-i
Workaround for DS issue in Llama #1932 @ugolowic
Upgrade LM Eval to 0.4.7 #1901 @astachowiczhabana
Disabling timers synchronization #1879 @bhargaveede
Limit max pos embeds to 8k to prevent OOM #1923 @jaygala223
Fix prompt argument handling in run_pipeline.py #1874 @varu060603
Allow offline mode in CI tests #1924 @astachowiczhabana
Adding memory and graph stats #1858 @jaygala223
Enable QLoRA tests with torch.compile mode #1918 @ckvermaAI
detr: fix possible incorrect tensor type #1899 @mgonchar
Fix --save_last_ckpt if --save_strategy no is set #1934 @vidyasiv
Reimplement HabanaGenerationTime #1920 @ugolowic
Pad the examples for QLoRa finetuning test #1941 @ckvermaAI
Reimplement HabanaGenerationTime fix for timer_checkpoint in sdxl training #1945 @gplutop7
Move bitsandbytes requirements from setup.py to bnb tests #1946 @ckvermaAI
Support allow_unspec_int_on_nn_module #1887 @xinyu-intel
Tokenizer config fix for dynamic mode #1903 @pramodkumar-habanalabs
Support compile from the 2nd iteration #1886 @xinyu-intel
fea(): ReadMe remote_trust fixes #1940 @imangohari1
Run upstream tests #1938 @IlyasMoutawwakil
Fix READMEs - SD paths and LLM PEFT example #1949 @dsocek
Add average latency metrics #1954 @RongLei-intel
Bitsandbytes installation for qlora tests #1951 @ckvermaAI
Update datasets requirement in examples #1956 @regisss
Use data cache in slow_tests_8x #1914 @karol-brejna-i
Add sentencepiece to requirements to support vicuna text generation #1962 @tthakkal
Fix FLUX fine-tuning script #1960 @dsocek
Fix typos #1967 @omahs
Update t5-small samples_per_second value #1968 @12010486
fea(): Added the --sdp_on_bf16 to textual inversion example #1964 @imangohari1
pytest t5 roberta fix #1971 @imangohari1
Update makefile for explicit lazy mode #1925 @jasi306
fea(): Added PT_HPU_LAZY_MODE=1 for diffuser tests #1975 @imangohari1
Fix deepspeed zero3 #1977 @IlyasMoutawwakil
Enable regional compilation in text generation #1927 @karol-brejna-i
README changes for Llama3.1 8B Finetuning with LoRA #1947 @bhargaveede
pt2e quant changes into the main script #1875 @vivek5-ai
Use IKS runners for CI #1953 @regisss
Fix sentence-transformers CI with new runners #1980 @regisss
Update dynamic env handling #1978 @yafshar
Fix wrong calculation of e2e latency #1984 @RongLei-intel
Update test baseline for mistralai/Mixtral-8x7B-v0.1 #1987 @yafshar
Switch to Spawn in PyTorch DataLoader when num_worker>0 #1982 @Wei-Lin-Intel
Enable mixtral 8x7b accuracy evaluation #1986 @rbogdano
Update readme files for explicit lazy mode #1921 @jasi306
Update README examples #2020 @pbielak
Pin latest optimum to force mutual updates #2016 @IlyasMoutawwakil

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.18.0: SynapseAI v1.21, Accelerate, CogVideoX, Llava-onevision

SynapseAI v1.21

Accelerate

Diffusers

CogVideoX

GLM4V

Siglip and Llava-onevision

Model optimizations

Other

Contributors

Uh oh!