Support FP8 Quantization and Inference Run on Intel Gaudi (HPU) using INC (Intel Neural Compressor)#12010

Merged

mgoin merged 35 commits intovllm-project:mainfrom

HabanaAI:dev/hpu_fp8

Jul 16, 2025

Commits on Jun 24, 2025

Support HPU fp8 quantization

nirda7
authored and
ulivne
committed
Refactor fp8 inc config and flow

nirda7
authored and
ulivne
committed
adjust destructors and calling finish measurements through shutdown

nirda7
authored and
ulivne
committed
Add documentation changes

nirda7
authored and
ulivne
committed
fix CR comments

nirda7
authored and
ulivne
committed
add more documentation changes

nirda7
authored and
ulivne
committed
some more CR fixes

nirda7
authored and
ulivne
committed
remove gaudi-installation duplication

nirda7
authored and
ulivne
committed
change inc.rst to inc.md

nirda7
authored and
ulivne
committed
fix more CR comments

nirda7
authored and
ulivne
committed
Add INC and Intel Gaudi to supported hardware table

nirda7
authored and
ulivne
committed
fix formatting

nirda7
authored and
ulivne
committed
Fix weights load device use

nirda7
authored and
ulivne
committed
fix shutdown flow after executors refactor

nirda7
authored and
ulivne
committed
fix shutdown flow

nirda7
authored and
ulivne
committed
add spdx header to inc.py

nirda7
authored and
ulivne
committed
fix unsynced distructors calling to None

nirda7
authored and
ulivne
committed
Fix inc flow and remove weights_load_device - use cpu by default

nirda7
authored and
ulivne
committed
fix get_name return type for inc.py

nirda7
authored and
ulivne
committed
fix md files

nirda7
authored and
ulivne
committed
fix CR comments and remove hpu worker
ulivne
committed
remvoe resolve_input method
ulivne
committed
undo more changes in linear.py
ulivne
committed
restore empty line
ulivne
committed
remove uneeded empty lines
ulivne
committed
restore removed files from original state
ulivne
committed
Fix pre-commit
ulivne
committed
additional pre-commit
ulivne
committed
pre commit type fix
ulivne
committed
Add moeConfig in inc.py
ulivne
committed

Commits on Jun 26, 2025

Support hpu for v1 kv cache dtype validation
ulivne
committed

Commits on Jul 8, 2025

Commits on Jul 15, 2025