SynapseAI v1.22
- Upgrade to SynapseAI v1.22 8171a96 @astachowiczhabana
Diffusers v0.34
- Diffusers 0.34.0 #2152 @imangohari1
GRPO trainer
- Enable trl GRPO trainer #2088 @schoi-habana
FP8 with FSDP
- Add support for fp8 fsdpa in the Mixtral model #2026 @astachowiczhabana
Deepspeed regional compilation
- Deepspeed regional compilation #2021 @IlyasMoutawwakil
Stable Diffusion
Snowflake Arctic
- Enabling Snowflake Arctic on Gaudi 3 #1719 @pi314ever
Model optimizations
- rt-detr: optimize loss calculation #1998 @mgonchar
- Use FusedSDPA in self_attention of Bert model #2115 @miaojinc
- Enable FusedRMSNorm for FLUX #2011 @dsocek
- Enable distributed CFG for SD3 pipeline #2015 @dsocek
- Refactor Qwen2 Family - FP32 SDPA and max_position_embedding #2030 @Wei-Lin-Intel
- Add Qwen classification #2062 @tianyuan211
- Reduce index_copy to fp8 in llama2 - QDQ flow #2065 @Tiefen-boop
Safe softmax
- Safe_softmax demonstration (#263) #1950 @astachowiczhabana
Bitsandbytes
- Integrated NF4 inference tests to text-generation #2058 @rsshaik1
- Remove bitsandbytes monkey-patching (II) #2114 @ckvermaAI
Other
- Fix to limit inputs_embeds.clone() to training only as it affects inference #1992 @emascarenhas
- Add additional info about attn batch split flag #1990 @jaygala223
- Update readme files for explicit lazy mode #1921 @jasi306
- Fix SD3 flag in README example #2013 @dsocek
- Fix text-generation requirements #1989 @vidyasiv
- Migrate tests to upstream repos #2002 @IlyasMoutawwakil
- Fix makefile commands #2025 @IlyasMoutawwakil
- Use AutoAWQ version right before introduction of qwen3 #2033 @IlyasMoutawwakil
- Add token to single card tests CI #2034 @IlyasMoutawwakil
- Minor Code Comments and Formatting Improvements #2035 @leopardracer
- More makefile fixes #2036 @IlyasMoutawwakil
- Remove text-generation-inference folder #2068 @regisss
- Updated the readme for mediapipe support #2012 @imangohari1
- Use makefile in Sentence Transformers CI #2073 @IlyasMoutawwakil
- Remove capture_pre_autograd_graph call #2042 @astachowiczhabana
- Enable_running_lm_eval_with_log_samples #2046 @astachowiczhabana
- Fixed lost modules in regional compilation #2047 @astachowiczhabana
- Enable accuracy benchmark using torch compile #2049 @astachowiczhabana
- Add support for reduced model #2050 @astachowiczhabana
- Enable QDQ #2051 @astachowiczhabana
- Minor Documentation Updates and Comments Clarification #2048 @kilavvy
- Hot fix compiled fsdp model saving failure #2028 @IlyasMoutawwakil
- Use PT_ENABLE_INT64_SUPPORT=1 for trl examples #2089 @pbielak
- Remove loss_kwargs from Gemma2 model.forward() and added missing positional_embeddings for Attention layer to sync with Transformers 4.49.0 #2100 @Luca-Calabria
- Silence Trainer.tokenizer warnings #2116 @pbielak
- Llama 3.2 - Fix the issue for eager mode (#260) #1976 @TANA-BHU
- Float inputs for Mixtral 8x7B #2043 @astachowiczhabana
- Fix diffuser tests #2054 @astachowiczhabana
- Ifeval and MMLU now better supported #2045 @astachowiczhabana
- Profiling improvements #1931 @ugolowic
- Add documentation workflow #2086 @echarlaix
- Add feature manager #1926 @astachowiczhabana
- Fix utils package #2141 @pbielak
- Use profiler in text-generation-pipeline #2154 @pbielak
- Add the PT_HPU_LAZY_MODE=1 env variable when testing in lazy mode #2161 @yafshar
- Updated peft version #2160 @imangohari1
- Fix version extraction regex and pip command in get_build() #2159 @yafshar
- Add warn0 utility to emit warnings only on main process #2157 @yafshar
- Remove DEEPSPEED_HPU_ZERO3_SYNC_MARK_STEP_REQUIRED #2171 @yafshar
- Extract HabanaModelAdapter from run_lm_eval to new script file. #2170 @AKloniecki
- Remove 'is_pt_flax_cross_test from wav2vec` tests #2174 @pbielak
- Fix test_model_weights_reload_no_missing_tied_weights #2175 @pbielak
- datasets updated to 3.6.0 version #2176 @alekseyfa
- Updated/Fixed the TIMM example readme #2172 @imangohari1
- Move torch, transformers and optimum.habana imports to local scope. #2183 @AKloniecki
- Move torch and transformers imports to local scope in run_generation.py. #2181 @AKloniecki
- Transformers deepseek-v3 Porting to optimum-habana #2186 @rkumar2patel
- Remove .float() conversion from Mixtral #2178 @pbielak
- Remove potential weakness reported by static code analysis -- CWE 569 -- in transformers/trainer.py #2196 @karol-brejna-i
- Ensure output directory exists before trying to write to output file. #2188 @AKloniecki
- Remove instances of logically dead code #2194 @ugolowic
- Remove unnecessary comparisons to None #2191 @ugolowic
- Fixes for bad use of potential None value #2198 @ugolowic
- qwen3: Fix missing max_position_embeddings init from config #2173 @mengker33
- Allow usage of cached books from project Guttenberg. #2190 @AKloniecki
- Remove potential weakness reported by static code analysis -- CWE 398 -- redundant if #2199 @karol-brejna-i
- Fix PT_HPU_LAZY_MODE assertion to match updated default value #2189 @AKloniecki
- Remove unnecessary null checks - modeling_mpt.py #2204 @karol-brejna-i
- Protecting mask undefined value. #2203 @karol-brejna-i
- Protecting all_cross_attentions in optimum/habana/transformers/models/blip/modeling_blip_text.py #2202 @karol-brejna-i
- Remove unnececary None checks for attention_mask #2205 @karol-brejna-i
- Configure qlora tests with additional arguments #2056 @ckvermaAI
- Skip unnecessary padding in text generation task #2055 @kyotoyx
- Unify SetTrueOrFalseOrNone and StoreTrueFalseAction #2119 @astachowiczhabana
- Fix profiler #2134 @astachowiczhabana
- Fix missing openorca dataset #2133 @astachowiczhabana
- Sync/videollava #2129 @yafshar
- Add support for local dataset loading for LibriSpeech and COCO #2136 @gplutop7
- Add sentencepiece to setup.py #2153 @pbielak
- Extract model adapter class from run_lm_eval.py to a new script file. #2184 @AKloniecki
- Fix for granite accuracy #2187 @12010486
- Temporarily revert SD quant files to fix promotion #2069 @astachowiczhabana