For Deepseek v32 and similar FP8 models, it is preferable to keep layers specified in ignore_layers (such as indexer or attn) in their original FP8 format, rather than dequantizing them to BF16.
Expected behavior:
AR_LOG_LEVEL=TRACE auto_round --model /models/Qwen3-8B-FP8 --ignore_layers "attn"
All layers within attention (matching "attn") should remain in FP8 format, not be dequantized to BF16 or float.
Depends on #1283
cc @wenhuach21 @thuang6 @xin3he