Release v1.19.0: SynapseAI v1.22, GRPO, Snowflake Arctic, Diffusers v0.34 · huggingface/optimum-habana

SynapseAI v1.22

Upgrade to SynapseAI v1.22 8171a96 @astachowiczhabana

Diffusers v0.34

Diffusers 0.34.0 #2152 @imangohari1

GRPO trainer

Enable trl GRPO trainer #2088 @schoi-habana

FP8 with FSDP

Add support for fp8 fsdpa in the Mixtral model #2026 @astachowiczhabana

Deepspeed regional compilation

Deepspeed regional compilation #2021 @IlyasMoutawwakil

Stable Diffusion

Add SD3 fine-tuning scripts #1966 @dsocek
Add boft support in stable-diffusion #1295 @sywangyi

Snowflake Arctic

Enabling Snowflake Arctic on Gaudi 3 #1719 @pi314ever

Model optimizations

rt-detr: optimize loss calculation #1998 @mgonchar
Use FusedSDPA in self_attention of Bert model #2115 @miaojinc
Enable FusedRMSNorm for FLUX #2011 @dsocek
Enable distributed CFG for SD3 pipeline #2015 @dsocek
Refactor Qwen2 Family - FP32 SDPA and max_position_embedding #2030 @Wei-Lin-Intel
Add Qwen classification #2062 @tianyuan211
Reduce index_copy to fp8 in llama2 - QDQ flow #2065 @Tiefen-boop

Safe softmax

Safe_softmax demonstration (#263) #1950 @astachowiczhabana

Bitsandbytes

Integrated NF4 inference tests to text-generation #2058 @rsshaik1
Remove bitsandbytes monkey-patching (II) #2114 @ckvermaAI

Other

Fix to limit inputs_embeds.clone() to training only as it affects inference #1992 @emascarenhas
Add additional info about attn batch split flag #1990 @jaygala223
Update readme files for explicit lazy mode #1921 @jasi306
Fix SD3 flag in README example #2013 @dsocek
Fix text-generation requirements #1989 @vidyasiv
Migrate tests to upstream repos #2002 @IlyasMoutawwakil
Fix makefile commands #2025 @IlyasMoutawwakil
Use AutoAWQ version right before introduction of qwen3 #2033 @IlyasMoutawwakil
Add token to single card tests CI #2034 @IlyasMoutawwakil
Minor Code Comments and Formatting Improvements #2035 @leopardracer
More makefile fixes #2036 @IlyasMoutawwakil
Remove text-generation-inference folder #2068 @regisss
Updated the readme for mediapipe support #2012 @imangohari1
Use makefile in Sentence Transformers CI #2073 @IlyasMoutawwakil
Remove capture_pre_autograd_graph call #2042 @astachowiczhabana
Enable_running_lm_eval_with_log_samples #2046 @astachowiczhabana
Fixed lost modules in regional compilation #2047 @astachowiczhabana
Enable accuracy benchmark using torch compile #2049 @astachowiczhabana
Add support for reduced model #2050 @astachowiczhabana
Enable QDQ #2051 @astachowiczhabana
Minor Documentation Updates and Comments Clarification #2048 @kilavvy
Hot fix compiled fsdp model saving failure #2028 @IlyasMoutawwakil
Use PT_ENABLE_INT64_SUPPORT=1 for trl examples #2089 @pbielak
Remove loss_kwargs from Gemma2 model.forward() and added missing positional_embeddings for Attention layer to sync with Transformers 4.49.0 #2100 @Luca-Calabria
Silence Trainer.tokenizer warnings #2116 @pbielak
Llama 3.2 - Fix the issue for eager mode (#260) #1976 @TANA-BHU
Float inputs for Mixtral 8x7B #2043 @astachowiczhabana
Fix diffuser tests #2054 @astachowiczhabana
Ifeval and MMLU now better supported #2045 @astachowiczhabana
Profiling improvements #1931 @ugolowic
Add documentation workflow #2086 @echarlaix
Add feature manager #1926 @astachowiczhabana
Fix utils package #2141 @pbielak
Use profiler in text-generation-pipeline #2154 @pbielak
Add the PT_HPU_LAZY_MODE=1 env variable when testing in lazy mode #2161 @yafshar
Updated peft version #2160 @imangohari1
Fix version extraction regex and pip command in get_build() #2159 @yafshar
Add warn0 utility to emit warnings only on main process #2157 @yafshar
Remove DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED #2171 @yafshar
Extract HabanaModelAdapter from run_lm_eval to new script file. #2170 @AKloniecki
Remove 'is_pt_flax_cross_test from wav2vec` tests #2174 @pbielak
Fix test_model_weights_reload_no_missing_tied_weights #2175 @pbielak
datasets updated to 3.6.0 version #2176 @alekseyfa
Updated/Fixed the TIMM example readme #2172 @imangohari1
Move torch, transformers and optimum.habana imports to local scope. #2183 @AKloniecki
Move torch and transformers imports to local scope in run_generation.py. #2181 @AKloniecki
Transformers deepseek-v3 Porting to optimum-habana #2186 @rkumar2patel
Remove .float() conversion from Mixtral #2178 @pbielak
Remove potential weakness reported by static code analysis -- CWE 569 -- in transformers/trainer.py #2196 @karol-brejna-i
Ensure output directory exists before trying to write to output file. #2188 @AKloniecki
Remove instances of logically dead code #2194 @ugolowic
Remove unnecessary comparisons to None #2191 @ugolowic
Fixes for bad use of potential None value #2198 @ugolowic
qwen3: Fix missing max_position_embeddings init from config #2173 @mengker33
Allow usage of cached books from project Guttenberg. #2190 @AKloniecki
Remove potential weakness reported by static code analysis -- CWE 398 -- redundant if #2199 @karol-brejna-i
Fix PT_HPU_LAZY_MODE assertion to match updated default value #2189 @AKloniecki
Remove unnecessary null checks - modeling_mpt.py #2204 @karol-brejna-i
Protecting mask undefined value. #2203 @karol-brejna-i
Protecting all_cross_attentions in optimum/habana/transformers/models/blip/modeling_blip_text.py #2202 @karol-brejna-i
Remove unnececary None checks for attention_mask #2205 @karol-brejna-i
Configure qlora tests with additional arguments #2056 @ckvermaAI
Skip unnecessary padding in text generation task #2055 @kyotoyx
Unify SetTrueOrFalseOrNone and StoreTrueFalseAction #2119 @astachowiczhabana
Fix profiler #2134 @astachowiczhabana
Fix missing openorca dataset #2133 @astachowiczhabana
Sync/videollava #2129 @yafshar
Add support for local dataset loading for LibriSpeech and COCO #2136 @gplutop7
Add sentencepiece to setup.py #2153 @pbielak
Extract model adapter class from run_lm_eval.py to a new script file. #2184 @AKloniecki
Fix for granite accuracy #2187 @12010486
Temporarily revert SD quant files to fix promotion #2069 @astachowiczhabana

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.19.0: SynapseAI v1.22, GRPO, Snowflake Arctic, Diffusers v0.34

SynapseAI v1.22

Diffusers v0.34

GRPO trainer

FP8 with FSDP

Deepspeed regional compilation

Stable Diffusion

Snowflake Arctic

Model optimizations

Safe softmax

Bitsandbytes

Other

Contributors

Uh oh!