Skip to content

Comments

Fix: quant config error on quantized offline eagle#925

Open
h-guo18 wants to merge 1 commit intomainfrom
haoguo/fix-0223
Open

Fix: quant config error on quantized offline eagle#925
h-guo18 wants to merge 1 commit intomainfrom
haoguo/fix-0223

Conversation

@h-guo18
Copy link
Contributor

@h-guo18 h-guo18 commented Feb 24, 2026

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Release Notes

  • Refactor
    • Enhanced quantization configuration handling for transformer models through improved type validation, ensuring more robust processing of quantized model configurations.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 24, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

Refactors quantization configuration handling in speculative decoding plugin from enum-based checking to type-based instance checking. Replaces QuantizationMethod.COMPRESSED_TENSORS comparison with isinstance() check against CompressedTensorsConfig, requiring corresponding import update.

Changes

Cohort / File(s) Summary
Quantization Config Type Checking
modelopt/torch/speculative/plugins/transformers.py
Updated import from QuantizationMethod to CompressedTensorsConfig. Modified conditional logic to use isinstance() check instead of enum comparison for detecting compressed tensors configuration, then appends ignore pattern to config.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title describes a fix for 'quant config error on quantized offline eagle', which aligns with the changeset's focus on replacing quantization config handling logic from QuantizationMethod path to CompressedTensorsConfig type checking.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch haoguo/fix-0223

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@h-guo18 h-guo18 marked this pull request as ready for review February 24, 2026 05:15
@h-guo18 h-guo18 requested a review from a team as a code owner February 24, 2026 05:15
@h-guo18 h-guo18 requested a review from yeyu-nvidia February 24, 2026 05:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/speculative/plugins/transformers.py`:
- Around line 588-590: quant_config may have ignore==None and the code blindly
calls .append(), causing AttributeError and duplicate patterns on repeated
calls; update the handling where quant_config is obtained (the quant_config
variable of type CompressedTensorsConfig) to ensure quant_config.ignore is
initialized to a list when None and only add the pattern "re:.*eagle_module.*"
if it is not already present (i.e., check membership before append) so repeated
calls to the modifier do not duplicate the entry.
- Line 51: Importing CompressedTensorsConfig unconditionally will break users on
transformers < v4.45.0; wrap the import in a try/except ImportError (or
ImportError/ModuleNotFoundError) and set CompressedTensorsConfig = None on
failure, and then at the usage site (the function/method that references
CompressedTensorsConfig in this module) guard the code path by checking if
CompressedTensorsConfig is not None before using it and raise a clear
RuntimeError or skip the compressed-tensors branch with a helpful message if
it's missing. Ensure you reference the symbol name CompressedTensorsConfig in
both the guarded import and the conditional check so the call site fails
gracefully when the class isn’t available.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52e662d and ba49490.

📒 Files selected for processing (1)
  • modelopt/torch/speculative/plugins/transformers.py

from transformers.trainer_pt_utils import LabelSmoother
from transformers.utils import ModelOutput
from transformers.utils.quantization_config import QuantizationMethod
from transformers.utils.quantization_config import CompressedTensorsConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check current transformers environment and CompressedTensorsConfig availability
python3 << 'EOF'
import transformers
print("=== Transformers Environment ===")
print(f"transformers version: {transformers.__version__}")

try:
    from transformers.utils.quantization_config import CompressedTensorsConfig
    print("CompressedTensorsConfig: AVAILABLE")
    print(f"Type: {type(CompressedTensorsConfig)}")
    print(f"Module: {CompressedTensorsConfig.__module__}")
except ImportError as e:
    print(f"CompressedTensorsConfig: NOT AVAILABLE")
    print(f"Error: {e}")

# Also check what's available in the quantization_config module
try:
    import transformers.utils.quantization_config as qc
    print("\n=== Available in transformers.utils.quantization_config ===")
    public_items = [item for item in dir(qc) if not item.startswith('_')]
    for item in sorted(public_items):
        print(f"  - {item}")
except Exception as e:
    print(f"Could not inspect quantization_config: {e}")
EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 190


🌐 Web query:

transformers CompressedTensorsConfig introduction version history release notes

💡 Result:

What it is (introduction)

CompressedTensorsConfig is the 🤗 Transformers quantization config wrapper used to load models stored in the compressed-tensors checkpoint format (a safetensors extension that can represent quantization + sparsity layouts, produced e.g. via llm-compressor). [1][2]

“Introduced in Transformers” (earliest version)

  • Not present in v4.44.0 API docs (no CompressedTensorsConfig entry). [3]
  • Present in v4.45.1 API docs (class documented under Main classes → Quantization). [4]

So, it was introduced between v4.44.0 (Aug 6, 2024) and v4.45.1 (Sep 26, 2024)—i.e., v4.45.x. Release dates from PyPI: v4.45.0 (Sep 25, 2024) and v4.45.1 (Sep 26, 2024). [5]

Notable API evolution (version history highlights)

  • v4.45.1: CompressedTensorsConfig exists with core fields like config_groups, format, quantization_status, kv_cache_scheme, ignore, etc. [4]
  • By v4.51.1 (Apr 8, 2025): docs include run_compressed (controls whether to alter submodules to emulate compressed execution). [6][5]
  • v4.55.4 (Aug 22, 2025): run_compressed: bool = True is documented in the signature. [7]
  • v5.0.0rc0 (Dec 1, 2025): still present; signature includes run_compressed, and docs clarify interaction between format and run_compressed. [8]

Release notes

Transformers’ public release notes do not reliably call out CompressedTensorsConfig specifically; the most concrete “release-note-grade” trace is the appearance/change in the versioned API docs above (v4.45.1 introduction; later addition of run_compressed). [4][6][7]

Sources: [1]–[8] correspond to the cited pages.


Guard the import against older transformers versions.

CompressedTensorsConfig was introduced in transformers v4.45.0 (September 2024). Importing it unconditionally at module level will raise an ImportError for any installation using transformers < v4.45.0 — even when the compressed-tensors quantization path is never exercised.

🛡️ Proposed safe import
-from transformers.utils.quantization_config import CompressedTensorsConfig
+try:
+    from transformers.utils.quantization_config import CompressedTensorsConfig
+except ImportError:
+    CompressedTensorsConfig = None

Then tighten the guard at the call site:

-if isinstance(quant_config, CompressedTensorsConfig):
+if CompressedTensorsConfig is not None and isinstance(quant_config, CompressedTensorsConfig):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/speculative/plugins/transformers.py` at line 51, Importing
CompressedTensorsConfig unconditionally will break users on transformers <
v4.45.0; wrap the import in a try/except ImportError (or
ImportError/ModuleNotFoundError) and set CompressedTensorsConfig = None on
failure, and then at the usage site (the function/method that references
CompressedTensorsConfig in this module) guard the code path by checking if
CompressedTensorsConfig is not None before using it and raise a clear
RuntimeError or skip the compressed-tensors branch with a helpful message if
it's missing. Ensure you reference the symbol name CompressedTensorsConfig in
both the guarded import and the conditional check so the call site fails
gracefully when the class isn’t available.

Comment on lines +588 to +590
quant_config = getattr(self.config, "quantization_config", None)
if isinstance(quant_config, CompressedTensorsConfig):
quant_config.ignore.append("re:.*eagle_module.*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

python << 'EOF'
try:
    from transformers.utils.quantization_config import CompressedTensorsConfig
    import inspect
    
    # Get the source code
    src = inspect.getsource(CompressedTensorsConfig)
    
    # Print the entire class definition (limit to first 200 lines to see init and field definitions)
    lines = src.splitlines()[:200]
    for i, line in enumerate(lines, 1):
        print(f"{i:3d}: {line}")
except Exception as e:
    print(f"Error: {e}")
    import traceback
    traceback.print_exc()
EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 245


🏁 Script executed:

# Also check if there's a default_factory or post_init that initializes ignore
python << 'EOF'
try:
    from transformers.utils.quantization_config import CompressedTensorsConfig
    import inspect
    
    # Check the __init__ method specifically
    if hasattr(CompressedTensorsConfig, '__init__'):
        init_src = inspect.getsource(CompressedTensorsConfig.__init__)
        print("=== __init__ method ===")
        print(init_src)
    
    # Check if it's a dataclass
    print("\n=== Class annotations ===")
    if hasattr(CompressedTensorsConfig, '__annotations__'):
        for field, type_hint in CompressedTensorsConfig.__annotations__.items():
            if 'ignore' in field.lower():
                print(f"{field}: {type_hint}")
                # Try to get default value
                if hasattr(CompressedTensorsConfig, field):
                    print(f"  Default: {getattr(CompressedTensorsConfig, field)}")
except Exception as e:
    print(f"Error: {e}")
    import traceback
    traceback.print_exc()
EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 245


🏁 Script executed:

# Let's also check the actual behavior - create an instance and inspect it
python << 'EOF'
try:
    from transformers.utils.quantization_config import CompressedTensorsConfig
    
    # Try creating an instance with minimal config
    config = CompressedTensorsConfig(quantization_config={})
    print(f"ignore attribute value: {repr(config.ignore)}")
    print(f"ignore attribute type: {type(config.ignore)}")
    print(f"ignore is None: {config.ignore is None}")
    
    # Check if we can safely append
    if config.ignore is not None:
        print("Can safely append to ignore")
    else:
        print("Cannot safely append - would cause AttributeError")
        
except Exception as e:
    print(f"Error: {e}")
    import traceback
    traceback.print_exc()
EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 245


🏁 Script executed:

# Find imports and usage of CompressedTensorsConfig in the repository
rg "CompressedTensorsConfig" -n --context 5

Repository: NVIDIA/Model-Optimizer

Length of output: 2174


🏁 Script executed:

# Check the actual file where the code appears
cat -n modelopt/torch/speculative/plugins/transformers.py | sed -n '580,610p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1440


🏁 Script executed:

# Search for any type hints or documentation about ignore field
rg "\.ignore" modelopt/torch/speculative/plugins/transformers.py -n -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 536


🌐 Web query:

CompressedTensorsConfig ignore field transformers library default value

💡 Result:

In Hugging Face Transformers, transformers.CompressedTensorsConfig defines ignore as an optional list with the default value None (i.e., no explicit ignore list unless you pass one). [1]

Source(s):
[1] Transformers quantization docs showing the CompressedTensorsConfig signature (includes ignore: Optional[list[str]] = None). (hugging-face.cn)

Citations:


🏁 Script executed:

# Verify if modify() can be called multiple times by searching for callers
rg "\.modify\(" modelopt/torch/speculative/plugins/transformers.py -n -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 573


🏁 Script executed:

# Check if there are multiple instantiations or if modify is called in loops
rg "modify" modelopt/torch/speculative/plugins/transformers.py -n

Repository: NVIDIA/Model-Optimizer

Length of output: 277


quant_config.ignore may be None, causing AttributeError on .append().

The transformers library defines CompressedTensorsConfig.ignore as Optional[List[str]] with a default value of None. Checkpoints saved without explicit ignore entries deserialize with ignore = None, causing the direct .append() call on line 590 to fail.

Additionally, if modify() is called more than once, the pattern "re:.*eagle_module.*" will be appended multiple times without deduplication.

🛡️ Proposed fix
         quant_config = getattr(self.config, "quantization_config", None)
         if isinstance(quant_config, CompressedTensorsConfig):
+            if quant_config.ignore is None:
+                quant_config.ignore = []
+            pattern = "re:.*eagle_module.*"
+            if pattern not in quant_config.ignore:
-            quant_config.ignore.append("re:.*eagle_module.*")
+                quant_config.ignore.append(pattern)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
quant_config = getattr(self.config, "quantization_config", None)
if isinstance(quant_config, CompressedTensorsConfig):
quant_config.ignore.append("re:.*eagle_module.*")
quant_config = getattr(self.config, "quantization_config", None)
if isinstance(quant_config, CompressedTensorsConfig):
if quant_config.ignore is None:
quant_config.ignore = []
pattern = "re:.*eagle_module.*"
if pattern not in quant_config.ignore:
quant_config.ignore.append(pattern)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/speculative/plugins/transformers.py` around lines 588 - 590,
quant_config may have ignore==None and the code blindly calls .append(), causing
AttributeError and duplicate patterns on repeated calls; update the handling
where quant_config is obtained (the quant_config variable of type
CompressedTensorsConfig) to ensure quant_config.ignore is initialized to a list
when None and only add the pattern "re:.*eagle_module.*" if it is not already
present (i.e., check membership before append) so repeated calls to the modifier
do not duplicate the entry.

@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.10%. Comparing base (52e662d) to head (ba49490).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #925   +/-   ##
=======================================
  Coverage   73.10%   73.10%           
=======================================
  Files         205      205           
  Lines       22294    22294           
=======================================
  Hits        16297    16297           
  Misses       5997     5997           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant