Fix: quant config error on quantized offline eagle by h-guo18 · Pull Request #925 · NVIDIA/Model-Optimizer

h-guo18 · 2026-02-24T05:09:20Z

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Release Notes

Refactor
- Enhanced quantization configuration handling for transformer models through improved type validation, ensuring more robust processing of quantized model configurations.

copy-pr-bot · 2026-02-24T05:09:23Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-24T05:09:28Z

📝 Walkthrough

Walkthrough

Refactors quantization configuration handling in speculative decoding plugin from enum-based checking to type-based instance checking. Replaces QuantizationMethod.COMPRESSED_TENSORS comparison with isinstance() check against CompressedTensorsConfig, requiring corresponding import update.

Changes

Cohort / File(s)	Summary
Quantization Config Type Checking `modelopt/torch/speculative/plugins/transformers.py`	Updated import from `QuantizationMethod` to `CompressedTensorsConfig`. Modified conditional logic to use `isinstance()` check instead of enum comparison for detecting compressed tensors configuration, then appends ignore pattern to config.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title describes a fix for 'quant config error on quantized offline eagle', which aligns with the changeset's focus on replacing quantization config handling logic from QuantizationMethod path to CompressedTensorsConfig type checking.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch haoguo/fix-0223

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/speculative/plugins/transformers.py`:
- Around line 588-590: quant_config may have ignore==None and the code blindly
calls .append(), causing AttributeError and duplicate patterns on repeated
calls; update the handling where quant_config is obtained (the quant_config
variable of type CompressedTensorsConfig) to ensure quant_config.ignore is
initialized to a list when None and only add the pattern "re:.*eagle_module.*"
if it is not already present (i.e., check membership before append) so repeated
calls to the modifier do not duplicate the entry.
- Line 51: Importing CompressedTensorsConfig unconditionally will break users on
transformers < v4.45.0; wrap the import in a try/except ImportError (or
ImportError/ModuleNotFoundError) and set CompressedTensorsConfig = None on
failure, and then at the usage site (the function/method that references
CompressedTensorsConfig in this module) guard the code path by checking if
CompressedTensorsConfig is not None before using it and raise a clear
RuntimeError or skip the compressed-tensors branch with a helpful message if
it's missing. Ensure you reference the symbol name CompressedTensorsConfig in
both the guarded import and the conditional check so the call site fails
gracefully when the class isn’t available.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52e662d and ba49490.

📒 Files selected for processing (1)

modelopt/torch/speculative/plugins/transformers.py

coderabbitai · 2026-02-24T05:22:49Z

modelopt/torch/speculative/plugins/transformers.py

 from transformers.trainer_pt_utils import LabelSmoother
 from transformers.utils import ModelOutput
-from transformers.utils.quantization_config import QuantizationMethod
+from transformers.utils.quantization_config import CompressedTensorsConfig


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check current transformers environment and CompressedTensorsConfig availability python3 << 'EOF' import transformers print("=== Transformers Environment ===") print(f"transformers version: {transformers.__version__}") try: from transformers.utils.quantization_config import CompressedTensorsConfig print("CompressedTensorsConfig: AVAILABLE") print(f"Type: {type(CompressedTensorsConfig)}") print(f"Module: {CompressedTensorsConfig.__module__}") except ImportError as e: print(f"CompressedTensorsConfig: NOT AVAILABLE") print(f"Error: {e}") # Also check what's available in the quantization_config module try: import transformers.utils.quantization_config as qc print("\n=== Available in transformers.utils.quantization_config ===") public_items = [item for item in dir(qc) if not item.startswith('_')] for item in sorted(public_items): print(f" - {item}") except Exception as e: print(f"Could not inspect quantization_config: {e}") EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 190

🌐 Web query:

transformers CompressedTensorsConfig introduction version history release notes

💡 Result:

What it is (introduction)

CompressedTensorsConfig is the 🤗 Transformers quantization config wrapper used to load models stored in the compressed-tensors checkpoint format (a safetensors extension that can represent quantization + sparsity layouts, produced e.g. via llm-compressor). [1][2]

“Introduced in Transformers” (earliest version)

Not present in v4.44.0 API docs (no CompressedTensorsConfig entry). [3]

Present in v4.45.1 API docs (class documented under Main classes → Quantization). [4]

So, it was introduced between v4.44.0 (Aug 6, 2024) and v4.45.1 (Sep 26, 2024)—i.e., v4.45.x. Release dates from PyPI: v4.45.0 (Sep 25, 2024) and v4.45.1 (Sep 26, 2024). [5]

Notable API evolution (version history highlights)

v4.45.1: CompressedTensorsConfig exists with core fields like config_groups, format, quantization_status, kv_cache_scheme, ignore, etc. [4]

By v4.51.1 (Apr 8, 2025): docs include run_compressed (controls whether to alter submodules to emulate compressed execution). [6][5]

v4.55.4 (Aug 22, 2025): run_compressed: bool = True is documented in the signature. [7]

v5.0.0rc0 (Dec 1, 2025): still present; signature includes run_compressed, and docs clarify interaction between format and run_compressed. [8]

Release notes

Transformers’ public release notes do not reliably call out CompressedTensorsConfig specifically; the most concrete “release-note-grade” trace is the appearance/change in the versioned API docs above (v4.45.1 introduction; later addition of run_compressed). [4][6][7]

Sources: [1]–[8] correspond to the cited pages.

Guard the import against older transformers versions.

CompressedTensorsConfig was introduced in transformers v4.45.0 (September 2024). Importing it unconditionally at module level will raise an ImportError for any installation using transformers < v4.45.0 — even when the compressed-tensors quantization path is never exercised.

🛡️ Proposed safe import

-from transformers.utils.quantization_config import CompressedTensorsConfig +try: + from transformers.utils.quantization_config import CompressedTensorsConfig +except ImportError: + CompressedTensorsConfig = None

Then tighten the guard at the call site:

-if isinstance(quant_config, CompressedTensorsConfig): +if CompressedTensorsConfig is not None and isinstance(quant_config, CompressedTensorsConfig):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/speculative/plugins/transformers.py` at line 51, Importing CompressedTensorsConfig unconditionally will break users on transformers < v4.45.0; wrap the import in a try/except ImportError (or ImportError/ModuleNotFoundError) and set CompressedTensorsConfig = None on failure, and then at the usage site (the function/method that references CompressedTensorsConfig in this module) guard the code path by checking if CompressedTensorsConfig is not None before using it and raise a clear RuntimeError or skip the compressed-tensors branch with a helpful message if it's missing. Ensure you reference the symbol name CompressedTensorsConfig in both the guarded import and the conditional check so the call site fails gracefully when the class isn’t available.

coderabbitai · 2026-02-24T05:22:49Z

modelopt/torch/speculative/plugins/transformers.py

+        quant_config = getattr(self.config, "quantization_config", None)
+        if isinstance(quant_config, CompressedTensorsConfig):
+            quant_config.ignore.append("re:.*eagle_module.*")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

python << 'EOF' try: from transformers.utils.quantization_config import CompressedTensorsConfig import inspect # Get the source code src = inspect.getsource(CompressedTensorsConfig) # Print the entire class definition (limit to first 200 lines to see init and field definitions) lines = src.splitlines()[:200] for i, line in enumerate(lines, 1): print(f"{i:3d}: {line}") except Exception as e: print(f"Error: {e}") import traceback traceback.print_exc() EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 245

🏁 Script executed:

# Also check if there's a default_factory or post_init that initializes ignore python << 'EOF' try: from transformers.utils.quantization_config import CompressedTensorsConfig import inspect # Check the __init__ method specifically if hasattr(CompressedTensorsConfig, '__init__'): init_src = inspect.getsource(CompressedTensorsConfig.__init__) print("=== __init__ method ===") print(init_src) # Check if it's a dataclass print("\n=== Class annotations ===") if hasattr(CompressedTensorsConfig, '__annotations__'): for field, type_hint in CompressedTensorsConfig.__annotations__.items(): if 'ignore' in field.lower(): print(f"{field}: {type_hint}") # Try to get default value if hasattr(CompressedTensorsConfig, field): print(f" Default: {getattr(CompressedTensorsConfig, field)}") except Exception as e: print(f"Error: {e}") import traceback traceback.print_exc() EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 245

🏁 Script executed:

# Let's also check the actual behavior - create an instance and inspect it python << 'EOF' try: from transformers.utils.quantization_config import CompressedTensorsConfig # Try creating an instance with minimal config config = CompressedTensorsConfig(quantization_config={}) print(f"ignore attribute value: {repr(config.ignore)}") print(f"ignore attribute type: {type(config.ignore)}") print(f"ignore is None: {config.ignore is None}") # Check if we can safely append if config.ignore is not None: print("Can safely append to ignore") else: print("Cannot safely append - would cause AttributeError") except Exception as e: print(f"Error: {e}") import traceback traceback.print_exc() EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 245

🏁 Script executed:

# Find imports and usage of CompressedTensorsConfig in the repository rg "CompressedTensorsConfig" -n --context 5

Repository: NVIDIA/Model-Optimizer

Length of output: 2174

🏁 Script executed:

# Check the actual file where the code appears cat -n modelopt/torch/speculative/plugins/transformers.py | sed -n '580,610p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1440

🏁 Script executed:

# Search for any type hints or documentation about ignore field rg "\.ignore" modelopt/torch/speculative/plugins/transformers.py -n -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 536

🌐 Web query:

CompressedTensorsConfig ignore field transformers library default value

💡 Result:

In Hugging Face Transformers, transformers.CompressedTensorsConfig defines ignore as an optional list with the default value None (i.e., no explicit ignore list unless you pass one). [1]

Source(s):
[1] Transformers quantization docs showing the CompressedTensorsConfig signature (includes ignore: Optional[list[str]] = None). (hugging-face.cn)

Citations:

1: https://hugging-face.cn/docs/transformers/main_classes/quantization?utm_source=openai

🏁 Script executed:

# Verify if modify() can be called multiple times by searching for callers rg "\.modify\(" modelopt/torch/speculative/plugins/transformers.py -n -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 573

🏁 Script executed:

# Check if there are multiple instantiations or if modify is called in loops rg "modify" modelopt/torch/speculative/plugins/transformers.py -n

Repository: NVIDIA/Model-Optimizer

Length of output: 277

quant_config.ignore may be None, causing AttributeError on .append().

The transformers library defines CompressedTensorsConfig.ignore as Optional[List[str]] with a default value of None. Checkpoints saved without explicit ignore entries deserialize with ignore = None, causing the direct .append() call on line 590 to fail.

Additionally, if modify() is called more than once, the pattern "re:.*eagle_module.*" will be appended multiple times without deduplication.

🛡️ Proposed fix

quant_config = getattr(self.config, "quantization_config", None) if isinstance(quant_config, CompressedTensorsConfig): + if quant_config.ignore is None: + quant_config.ignore = [] + pattern = "re:.*eagle_module.*" + if pattern not in quant_config.ignore: - quant_config.ignore.append("re:.*eagle_module.*") + quant_config.ignore.append(pattern)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

quant_config = getattr(self.config, "quantization_config", None)

if isinstance(quant_config, CompressedTensorsConfig):

quant_config.ignore.append("re:.*eagle_module.*")

quant_config = getattr(self.config, "quantization_config", None)

if isinstance(quant_config, CompressedTensorsConfig):

if quant_config.ignore is None:

quant_config.ignore = []

pattern = "re:.*eagle_module.*"

if pattern not in quant_config.ignore:

quant_config.ignore.append(pattern)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/speculative/plugins/transformers.py` around lines 588 - 590, quant_config may have ignore==None and the code blindly calls .append(), causing AttributeError and duplicate patterns on repeated calls; update the handling where quant_config is obtained (the quant_config variable of type CompressedTensorsConfig) to ensure quant_config.ignore is initialized to a list when None and only add the pattern "re:.*eagle_module.*" if it is not already present (i.e., check membership before append) so repeated calls to the modifier do not duplicate the entry.

codecov · 2026-02-24T05:28:45Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.10%. Comparing base (52e662d) to head (ba49490).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #925   +/-   ##
=======================================
  Coverage   73.10%   73.10%           
=======================================
  Files         205      205           
  Lines       22294    22294           
=======================================
  Hits        16297    16297           
  Misses       5997     5997

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fix: quant config err on quantized offline model

ba49490

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 force-pushed the haoguo/fix-0223 branch from f422556 to ba49490 Compare February 24, 2026 05:15

h-guo18 marked this pull request as ready for review February 24, 2026 05:15

h-guo18 requested a review from a team as a code owner February 24, 2026 05:15

h-guo18 requested a review from yeyu-nvidia February 24, 2026 05:15

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix: quant config error on quantized offline eagle#925

Fix: quant config error on quantized offline eagle#925
h-guo18 wants to merge 1 commit intomainfrom
haoguo/fix-0223

h-guo18 commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Feb 24, 2026

Uh oh!

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 24, 2026

Uh oh!

coderabbitai bot Feb 24, 2026

Uh oh!

codecov bot commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

h-guo18 commented Feb 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Feb 24, 2026

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

What it is (introduction)

“Introduced in Transformers” (earliest version)

Notable API evolution (version history highlights)

Release notes

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 24, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

h-guo18 commented Feb 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading