Skip to content

[feat] For FP8 models, keep layers specified in ignore_layers in their original FP8 format #1284

@yiliu30

Description

@yiliu30

For Deepseek v32 and similar FP8 models, it is preferable to keep layers specified in ignore_layers (such as indexer or attn) in their original FP8 format, rather than dequantizing them to BF16.

Expected behavior:

AR_LOG_LEVEL=TRACE auto_round --model /models/Qwen3-8B-FP8 --ignore_layers "attn"

All layers within attention (matching "attn") should remain in FP8 format, not be dequantized to BF16 or float.

Depends on #1283

cc @wenhuach21 @thuang6 @xin3he

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions