Fix: quant config error on quantized offline eagle#925
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughRefactors quantization configuration handling in speculative decoding plugin from enum-based checking to type-based instance checking. Replaces Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
f422556 to
ba49490
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/speculative/plugins/transformers.py`:
- Around line 588-590: quant_config may have ignore==None and the code blindly
calls .append(), causing AttributeError and duplicate patterns on repeated
calls; update the handling where quant_config is obtained (the quant_config
variable of type CompressedTensorsConfig) to ensure quant_config.ignore is
initialized to a list when None and only add the pattern "re:.*eagle_module.*"
if it is not already present (i.e., check membership before append) so repeated
calls to the modifier do not duplicate the entry.
- Line 51: Importing CompressedTensorsConfig unconditionally will break users on
transformers < v4.45.0; wrap the import in a try/except ImportError (or
ImportError/ModuleNotFoundError) and set CompressedTensorsConfig = None on
failure, and then at the usage site (the function/method that references
CompressedTensorsConfig in this module) guard the code path by checking if
CompressedTensorsConfig is not None before using it and raise a clear
RuntimeError or skip the compressed-tensors branch with a helpful message if
it's missing. Ensure you reference the symbol name CompressedTensorsConfig in
both the guarded import and the conditional check so the call site fails
gracefully when the class isn’t available.
| from transformers.trainer_pt_utils import LabelSmoother | ||
| from transformers.utils import ModelOutput | ||
| from transformers.utils.quantization_config import QuantizationMethod | ||
| from transformers.utils.quantization_config import CompressedTensorsConfig |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check current transformers environment and CompressedTensorsConfig availability
python3 << 'EOF'
import transformers
print("=== Transformers Environment ===")
print(f"transformers version: {transformers.__version__}")
try:
from transformers.utils.quantization_config import CompressedTensorsConfig
print("CompressedTensorsConfig: AVAILABLE")
print(f"Type: {type(CompressedTensorsConfig)}")
print(f"Module: {CompressedTensorsConfig.__module__}")
except ImportError as e:
print(f"CompressedTensorsConfig: NOT AVAILABLE")
print(f"Error: {e}")
# Also check what's available in the quantization_config module
try:
import transformers.utils.quantization_config as qc
print("\n=== Available in transformers.utils.quantization_config ===")
public_items = [item for item in dir(qc) if not item.startswith('_')]
for item in sorted(public_items):
print(f" - {item}")
except Exception as e:
print(f"Could not inspect quantization_config: {e}")
EOFRepository: NVIDIA/Model-Optimizer
Length of output: 190
🌐 Web query:
transformers CompressedTensorsConfig introduction version history release notes
💡 Result:
What it is (introduction)
CompressedTensorsConfig is the 🤗 Transformers quantization config wrapper used to load models stored in the compressed-tensors checkpoint format (a safetensors extension that can represent quantization + sparsity layouts, produced e.g. via llm-compressor). [1][2]
“Introduced in Transformers” (earliest version)
- Not present in v4.44.0 API docs (no
CompressedTensorsConfigentry). [3] - Present in v4.45.1 API docs (class documented under Main classes → Quantization). [4]
So, it was introduced between v4.44.0 (Aug 6, 2024) and v4.45.1 (Sep 26, 2024)—i.e., v4.45.x. Release dates from PyPI: v4.45.0 (Sep 25, 2024) and v4.45.1 (Sep 26, 2024). [5]
Notable API evolution (version history highlights)
- v4.45.1:
CompressedTensorsConfigexists with core fields likeconfig_groups,format,quantization_status,kv_cache_scheme,ignore, etc. [4] - By v4.51.1 (Apr 8, 2025): docs include
run_compressed(controls whether to alter submodules to emulate compressed execution). [6][5] - v4.55.4 (Aug 22, 2025):
run_compressed: bool = Trueis documented in the signature. [7] - v5.0.0rc0 (Dec 1, 2025): still present; signature includes
run_compressed, and docs clarify interaction betweenformatandrun_compressed. [8]
Release notes
Transformers’ public release notes do not reliably call out CompressedTensorsConfig specifically; the most concrete “release-note-grade” trace is the appearance/change in the versioned API docs above (v4.45.1 introduction; later addition of run_compressed). [4][6][7]
Sources: [1]–[8] correspond to the cited pages.
Guard the import against older transformers versions.
CompressedTensorsConfig was introduced in transformers v4.45.0 (September 2024). Importing it unconditionally at module level will raise an ImportError for any installation using transformers < v4.45.0 — even when the compressed-tensors quantization path is never exercised.
🛡️ Proposed safe import
-from transformers.utils.quantization_config import CompressedTensorsConfig
+try:
+ from transformers.utils.quantization_config import CompressedTensorsConfig
+except ImportError:
+ CompressedTensorsConfig = NoneThen tighten the guard at the call site:
-if isinstance(quant_config, CompressedTensorsConfig):
+if CompressedTensorsConfig is not None and isinstance(quant_config, CompressedTensorsConfig):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/speculative/plugins/transformers.py` at line 51, Importing
CompressedTensorsConfig unconditionally will break users on transformers <
v4.45.0; wrap the import in a try/except ImportError (or
ImportError/ModuleNotFoundError) and set CompressedTensorsConfig = None on
failure, and then at the usage site (the function/method that references
CompressedTensorsConfig in this module) guard the code path by checking if
CompressedTensorsConfig is not None before using it and raise a clear
RuntimeError or skip the compressed-tensors branch with a helpful message if
it's missing. Ensure you reference the symbol name CompressedTensorsConfig in
both the guarded import and the conditional check so the call site fails
gracefully when the class isn’t available.
| quant_config = getattr(self.config, "quantization_config", None) | ||
| if isinstance(quant_config, CompressedTensorsConfig): | ||
| quant_config.ignore.append("re:.*eagle_module.*") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
python << 'EOF'
try:
from transformers.utils.quantization_config import CompressedTensorsConfig
import inspect
# Get the source code
src = inspect.getsource(CompressedTensorsConfig)
# Print the entire class definition (limit to first 200 lines to see init and field definitions)
lines = src.splitlines()[:200]
for i, line in enumerate(lines, 1):
print(f"{i:3d}: {line}")
except Exception as e:
print(f"Error: {e}")
import traceback
traceback.print_exc()
EOFRepository: NVIDIA/Model-Optimizer
Length of output: 245
🏁 Script executed:
# Also check if there's a default_factory or post_init that initializes ignore
python << 'EOF'
try:
from transformers.utils.quantization_config import CompressedTensorsConfig
import inspect
# Check the __init__ method specifically
if hasattr(CompressedTensorsConfig, '__init__'):
init_src = inspect.getsource(CompressedTensorsConfig.__init__)
print("=== __init__ method ===")
print(init_src)
# Check if it's a dataclass
print("\n=== Class annotations ===")
if hasattr(CompressedTensorsConfig, '__annotations__'):
for field, type_hint in CompressedTensorsConfig.__annotations__.items():
if 'ignore' in field.lower():
print(f"{field}: {type_hint}")
# Try to get default value
if hasattr(CompressedTensorsConfig, field):
print(f" Default: {getattr(CompressedTensorsConfig, field)}")
except Exception as e:
print(f"Error: {e}")
import traceback
traceback.print_exc()
EOFRepository: NVIDIA/Model-Optimizer
Length of output: 245
🏁 Script executed:
# Let's also check the actual behavior - create an instance and inspect it
python << 'EOF'
try:
from transformers.utils.quantization_config import CompressedTensorsConfig
# Try creating an instance with minimal config
config = CompressedTensorsConfig(quantization_config={})
print(f"ignore attribute value: {repr(config.ignore)}")
print(f"ignore attribute type: {type(config.ignore)}")
print(f"ignore is None: {config.ignore is None}")
# Check if we can safely append
if config.ignore is not None:
print("Can safely append to ignore")
else:
print("Cannot safely append - would cause AttributeError")
except Exception as e:
print(f"Error: {e}")
import traceback
traceback.print_exc()
EOFRepository: NVIDIA/Model-Optimizer
Length of output: 245
🏁 Script executed:
# Find imports and usage of CompressedTensorsConfig in the repository
rg "CompressedTensorsConfig" -n --context 5Repository: NVIDIA/Model-Optimizer
Length of output: 2174
🏁 Script executed:
# Check the actual file where the code appears
cat -n modelopt/torch/speculative/plugins/transformers.py | sed -n '580,610p'Repository: NVIDIA/Model-Optimizer
Length of output: 1440
🏁 Script executed:
# Search for any type hints or documentation about ignore field
rg "\.ignore" modelopt/torch/speculative/plugins/transformers.py -n -B 2 -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 536
🌐 Web query:
CompressedTensorsConfig ignore field transformers library default value
💡 Result:
In Hugging Face Transformers, transformers.CompressedTensorsConfig defines ignore as an optional list with the default value None (i.e., no explicit ignore list unless you pass one). [1]
Source(s):
[1] Transformers quantization docs showing the CompressedTensorsConfig signature (includes ignore: Optional[list[str]] = None). (hugging-face.cn)
Citations:
🏁 Script executed:
# Verify if modify() can be called multiple times by searching for callers
rg "\.modify\(" modelopt/torch/speculative/plugins/transformers.py -n -B 2 -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 573
🏁 Script executed:
# Check if there are multiple instantiations or if modify is called in loops
rg "modify" modelopt/torch/speculative/plugins/transformers.py -nRepository: NVIDIA/Model-Optimizer
Length of output: 277
quant_config.ignore may be None, causing AttributeError on .append().
The transformers library defines CompressedTensorsConfig.ignore as Optional[List[str]] with a default value of None. Checkpoints saved without explicit ignore entries deserialize with ignore = None, causing the direct .append() call on line 590 to fail.
Additionally, if modify() is called more than once, the pattern "re:.*eagle_module.*" will be appended multiple times without deduplication.
🛡️ Proposed fix
quant_config = getattr(self.config, "quantization_config", None)
if isinstance(quant_config, CompressedTensorsConfig):
+ if quant_config.ignore is None:
+ quant_config.ignore = []
+ pattern = "re:.*eagle_module.*"
+ if pattern not in quant_config.ignore:
- quant_config.ignore.append("re:.*eagle_module.*")
+ quant_config.ignore.append(pattern)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| quant_config = getattr(self.config, "quantization_config", None) | |
| if isinstance(quant_config, CompressedTensorsConfig): | |
| quant_config.ignore.append("re:.*eagle_module.*") | |
| quant_config = getattr(self.config, "quantization_config", None) | |
| if isinstance(quant_config, CompressedTensorsConfig): | |
| if quant_config.ignore is None: | |
| quant_config.ignore = [] | |
| pattern = "re:.*eagle_module.*" | |
| if pattern not in quant_config.ignore: | |
| quant_config.ignore.append(pattern) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/speculative/plugins/transformers.py` around lines 588 - 590,
quant_config may have ignore==None and the code blindly calls .append(), causing
AttributeError and duplicate patterns on repeated calls; update the handling
where quant_config is obtained (the quant_config variable of type
CompressedTensorsConfig) to ensure quant_config.ignore is initialized to a list
when None and only add the pattern "re:.*eagle_module.*" if it is not already
present (i.e., check membership before append) so repeated calls to the modifier
do not duplicate the entry.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #925 +/- ##
=======================================
Coverage 73.10% 73.10%
=======================================
Files 205 205
Lines 22294 22294
=======================================
Hits 16297 16297
Misses 5997 5997 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: ?
Overview: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
Release Notes