Skip to content

Comments

Support decoder block-level sequential calibration#924

Open
sugunav14 wants to merge 5 commits intomainfrom
svelury/sequential-calibrate
Open

Support decoder block-level sequential calibration#924
sugunav14 wants to merge 5 commits intomainfrom
svelury/sequential-calibrate

Conversation

@sugunav14
Copy link
Contributor

@sugunav14 sugunav14 commented Feb 24, 2026

What does this PR do?

Type of change: New feature

Overview: Add support for sequential calibration of layers (at decoder level granularity) in ModelOpt.

Calibration flow

  1. Get list of decoder blocks
  2. For current block call get input activations (considering weight and activation QDQ from all other previous blocks) and call specified calibration function.

functions added

  1. get_decoder_layers() -> to detect and get list of blocks to iterate over
  2. LayerActivationCollector class -> to get input activations to the layer
  3. sequential_calibrate() -> to perform the described calibration flow
  4. use_sequential field in QuantizeAlgorithmConfig

Usage

# Sample config
NVFP4_DEFAULT_CFG = {
    "quant_cfg": {
        "*weight_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "dynamic", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        "*input_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "dynamic", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        **_default_disabled_quantizer_cfg,
    },
    "algorithm": {
           "method": "max",
           "use_sequential": True,
}

Set use_sequential=True in QUANT_CFG's "algorithm" section.

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • New Features
    • Sequential layer-by-layer calibration: Quantization now supports processing decoder layers sequentially to improve memory efficiency on large models.

Signed-off-by: Suguna Velury <[email protected]>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 24, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added sequential layer-by-layer calibration functionality to quantization pipeline. Introduces a configuration flag to enable this mode, implements the calibration orchestration logic, defines sequential calibration operations, and provides utilities for layer extraction and activation collection.

Changes

Cohort / File(s) Summary
Configuration & Orchestration
modelopt/torch/quantization/config.py, modelopt/torch/quantization/mode.py
Added use_sequential boolean field to QuantizeAlgorithmConfig. Updated mode.py to conditionally route to sequential_calibrate when flag is enabled, with backward compatibility for existing direct function calls.
Sequential Calibration Implementation
modelopt/torch/quantization/model_calib.py
Implemented sequential_calibrate() function that performs layer-by-layer calibration on transformer decoder layers by collecting per-layer activations and invoking calibration logic per layer.
Activation & Layer Utilities
modelopt/torch/quantization/utils.py, modelopt/torch/utils/network.py
Added LayerActivationCollector class to capture layer inputs via forward patching, introduced _EarlyStopForwardError exception for control flow, and implemented get_decoder_layers() utility to extract decoder layers from various model architectures.

Sequence Diagram

sequenceDiagram
    actor User
    participant Config as QuantizeAlgorithmConfig
    participant Mode as mode.py<br/>(Orchestration)
    participant ModelCalib as sequential_calibrate
    participant Collector as LayerActivationCollector
    participant Network as get_decoder_layers
    participant Model as Model

    User->>Config: Create config with<br/>use_sequential=True
    User->>Mode: Call with config
    Mode->>Mode: Check use_sequential flag
    Mode->>ModelCalib: Call sequential_calibrate()
    ModelCalib->>Network: get_decoder_layers(model)
    Network-->>ModelCalib: Return decoder layers
    loop For each layer
        ModelCalib->>Collector: Initialize collector<br/>for layer
        Collector->>Model: Patch layer forward
        Collector->>Model: Run forward pass
        Collector-->>ModelCalib: Collect layer inputs
        Collector->>Model: Unpatch layer
        ModelCalib->>ModelCalib: Call calib_func<br/>on layer inputs
    end
    ModelCalib-->>Mode: Calibration complete
    Mode-->>User: Return result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature introduced: sequential calibration at the decoder block level. It is concise, specific, and directly reflects the primary changes across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch svelury/sequential-calibrate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Suguna Velury <[email protected]>
@sugunav14 sugunav14 marked this pull request as ready for review February 24, 2026 01:49
@sugunav14 sugunav14 requested review from a team as code owners February 24, 2026 01:49
Signed-off-by: Suguna Velury <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/mode.py`:
- Around line 225-243: When use_sequential (sequential) is enabled, validate
that forward_loop is provided and callable before calling sequential_calibrate;
if forward_loop is None or not callable raise a clear ValueError explaining that
sequential calibration requires a callable forward_loop. Update the branch where
sequential is True (around the sequential_calibrate call in mode.py) to perform
this check and raise the explicit error instead of letting sequential_calibrate
fail later.

In `@modelopt/torch/quantization/model_calib.py`:
- Around line 1836-1867: The sequential_calibrate function calls calib_func with
inputs as a second positional argument which collides with calibrator signatures
(causing TypeError); change the call in sequential_calibrate to pass only
forward_loop as the positional arg and supply the activations via a named
keyword (e.g., inputs=inputs) if the calibrator expects them; locate the call to
calib_func in sequential_calibrate (and the local _layer_forward_loop which uses
get_input_activations from LayerActivationCollector) and replace
calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs) with
a keyword-argument style call (for example calib_func(layer,
forward_loop=_layer_forward_loop, inputs=inputs, **calib_kwargs)) so no
positional collision occurs, then keep the existing cleanup (del inputs;
torch.cuda.empty_cache()).

In `@modelopt/torch/quantization/utils.py`:
- Around line 816-872: The patched layer forward (_forward_w_data_collection
inside _patch_and_initialize_layer) currently only appends inputs and never
calls the original forward, so when stop_after_collection is False the layer
returns None and breaks the model; modify _forward_w_data_collection to, after
appending to self.inputs, call and return the original forward (e.g. call
self._original_forward(*args, **kwargs) if present) when stop_after_collection
is False (and retain the early raise when True), ensuring you reference
bind_forward_method/_original_forward so the original method is invoked
correctly.

In `@modelopt/torch/utils/network.py`:
- Around line 639-673: get_decoder_layers currently inspects attributes on the
passed module and misses wrapped models (DataParallel/FSDP/DeepSpeed), so first
call unwrap_model(model, force_unwrap=True) and reassign the result to model at
the start of get_decoder_layers; then proceed to check the usual attributes
(model.model.layers, model.decoder.layers, model.layers, model.transformer.h,
model.backbone.layers) on the unwrapped model to correctly locate and return the
decoder ModuleList or None.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52e662d and a938963.

📒 Files selected for processing (5)
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/mode.py
  • modelopt/torch/quantization/model_calib.py
  • modelopt/torch/quantization/utils.py
  • modelopt/torch/utils/network.py

Comment on lines 225 to 243
sequential = kwargs.pop("use_sequential", False)
if method is not None and "awq" in method:
# For backward compatibility
kwargs["algorithm"] = method

if func is not None:
# Call the function with forward_loop as a separate argument
func(model, forward_loop=forward_loop, **kwargs)
if sequential:
# Wrap with sequential processing
sequential_calibrate(
model,
forward_loop=forward_loop,
calib_func=func,
**kwargs,
)
else:
# Direct calibration (existing behavior)
func(model, forward_loop=forward_loop, **kwargs)
else:
raise ValueError(f"No calibration function provided for method: {method}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate forward_loop when use_sequential is enabled.

sequential_calibrate assumes a callable forward_loop; if it's None, the error shows up later and is harder to diagnose. Add an explicit check and clear message before calling.

💡 Suggested fix
     if func is not None:
         if sequential:
+            if forward_loop is None:
+                raise ValueError("forward_loop must be provided when use_sequential=True")
             # Wrap with sequential processing
             sequential_calibrate(
                 model,
                 forward_loop=forward_loop,
                 calib_func=func,
                 **kwargs,
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sequential = kwargs.pop("use_sequential", False)
if method is not None and "awq" in method:
# For backward compatibility
kwargs["algorithm"] = method
if func is not None:
# Call the function with forward_loop as a separate argument
func(model, forward_loop=forward_loop, **kwargs)
if sequential:
# Wrap with sequential processing
sequential_calibrate(
model,
forward_loop=forward_loop,
calib_func=func,
**kwargs,
)
else:
# Direct calibration (existing behavior)
func(model, forward_loop=forward_loop, **kwargs)
else:
raise ValueError(f"No calibration function provided for method: {method}")
sequential = kwargs.pop("use_sequential", False)
if method is not None and "awq" in method:
# For backward compatibility
kwargs["algorithm"] = method
if func is not None:
if sequential:
if forward_loop is None:
raise ValueError("forward_loop must be provided when use_sequential=True")
# Wrap with sequential processing
sequential_calibrate(
model,
forward_loop=forward_loop,
calib_func=func,
**kwargs,
)
else:
# Direct calibration (existing behavior)
func(model, forward_loop=forward_loop, **kwargs)
else:
raise ValueError(f"No calibration function provided for method: {method}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/mode.py` around lines 225 - 243, When
use_sequential (sequential) is enabled, validate that forward_loop is provided
and callable before calling sequential_calibrate; if forward_loop is None or not
callable raise a clear ValueError explaining that sequential calibration
requires a callable forward_loop. Update the branch where sequential is True
(around the sequential_calibrate call in mode.py) to perform this check and
raise the explicit error instead of letting sequential_calibrate fail later.

Comment on lines 1836 to 1867
@torch.no_grad()
def sequential_calibrate(
model: nn.Module,
forward_loop: ForwardLoop,
calib_func: Callable,
**calib_kwargs,
):
"""Sequential calibration - a sequential layer-by-layer calibration algorithm."""
transformer_layers = get_decoder_layers(model)
if transformer_layers is None:
raise ValueError(
"Could not find transformer layers in model'. "
"Sequential calibration requires a model with identifiable transformer layers."
)

print_rank_0(f"Sequential calibration: Found {len(transformer_layers)} transformer layers")

gettr = LayerActivationCollector(model)

for _, layer in enumerate(transformer_layers):
# Get updated input activations to the current layer
inputs = gettr.get_input_activations(layer, forward_loop)

# Define a forward loop for the current layer
def _layer_forward_loop(m):
for args, kwargs_input in inputs: # noqa: F821
m(*args, **kwargs_input)

# Call GPTQ
calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs)
del inputs
torch.cuda.empty_cache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix sequential_calibrate invoking the calibrator with incorrect positional args.

calib_func(layer, inputs, forward_loop=...) passes inputs as the second positional argument, which collides with forward_loop in existing calibrators and will raise TypeError (or treat a list as a callable). Call the calibrator with forward_loop only and pass inputs via a named kwarg if needed.

🐛 Proposed fix
-        calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs)
+        calib_func(layer, forward_loop=_layer_forward_loop, **calib_kwargs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 1836 - 1867, The
sequential_calibrate function calls calib_func with inputs as a second
positional argument which collides with calibrator signatures (causing
TypeError); change the call in sequential_calibrate to pass only forward_loop as
the positional arg and supply the activations via a named keyword (e.g.,
inputs=inputs) if the calibrator expects them; locate the call to calib_func in
sequential_calibrate (and the local _layer_forward_loop which uses
get_input_activations from LayerActivationCollector) and replace
calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs) with
a keyword-argument style call (for example calib_func(layer,
forward_loop=_layer_forward_loop, inputs=inputs, **calib_kwargs)) so no
positional collision occurs, then keep the existing cleanup (del inputs;
torch.cuda.empty_cache()).

Comment on lines +816 to +872
class _EarlyStopForwardError(Exception):
"""Error to stop the forward pass after collection."""


class LayerActivationCollector:
"""Helper class for collecting layer activations during forward passes.

This class allows for sequential layer calibration by
patching layers to capture inputs/outputs during forward passes
"""

def __init__(self, model: nn.Module):
self.model = model

@staticmethod
def _patch_and_initialize_layer(layer: torch.nn.Module, stop_after_collection: bool = False):
"""Patch a layer to collect inputs during forward passes."""

def _forward_w_data_collection(self, *args, **kwargs):
# Note: 'self' refers to the patched layer.
assert len(args) >= 1, (
f"Expected at least 1 positional arg, got {len(args)} args and {list(kwargs.keys())} kwargs"
)
# Only collect the inputs to the layer
self.inputs.append((args, kwargs))
if stop_after_collection:
raise _EarlyStopForwardError() # Stop the forward pass after collection

bind_forward_method(layer, _forward_w_data_collection, "_original_forward")
layer.inputs = []

@staticmethod
def _unpatch_and_cleanup_layer(layer: torch.nn.Module):
if hasattr(layer, "_original_forward"):
unpatch_forward_method(layer, "_original_forward")
if hasattr(layer, "inputs"):
del layer.inputs

@torch.no_grad()
def get_input_activations(self, layer: torch.nn.Module, forward_loop: ForwardLoop) -> list:
# Wrap model forward to catch _EarlyStopForward per-batch
def _early_stop_forward(self, *args, **kwargs):
try:
return self._original_forward(*args, **kwargs)
except _EarlyStopForwardError:
return None # Stop propagation but allow next batch

try:
bind_forward_method(self.model, _early_stop_forward, "_original_forward")
self._patch_and_initialize_layer(layer, stop_after_collection=True)
forward_loop(self.model)
inputs = layer.inputs.copy()
finally:
self._unpatch_and_cleanup_layer(layer)
unpatch_forward_method(self.model, "_original_forward")

return inputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve the original forward when not early-stopping.

_forward_w_data_collection never calls the original forward, so stop_after_collection=False makes the patched layer return None and breaks downstream execution. Either enforce early-stop or forward to _original_forward.

🐛 Proposed fix
         def _forward_w_data_collection(self, *args, **kwargs):
             # Note: 'self' refers to the patched layer.
             assert len(args) >= 1, (
                 f"Expected at least 1 positional arg, got {len(args)} args and {list(kwargs.keys())} kwargs"
             )
             # Only collect the inputs to the layer
             self.inputs.append((args, kwargs))
             if stop_after_collection:
                 raise _EarlyStopForwardError()  # Stop the forward pass after collection
+            return self._original_forward(*args, **kwargs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/utils.py` around lines 816 - 872, The patched
layer forward (_forward_w_data_collection inside _patch_and_initialize_layer)
currently only appends inputs and never calls the original forward, so when
stop_after_collection is False the layer returns None and breaks the model;
modify _forward_w_data_collection to, after appending to self.inputs, call and
return the original forward (e.g. call self._original_forward(*args, **kwargs)
if present) when stop_after_collection is False (and retain the early raise when
True), ensuring you reference bind_forward_method/_original_forward so the
original method is invoked correctly.

Comment on lines 639 to 673
def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None:
"""Get the decoder layers from a model for sequential calibration.

Args:
model: The model to extract decoder layers from.
granularity: The type of layers to extract. Currently only "decoder" is supported.

Returns:
A ModuleList of decoder layers, or None if not found.
"""
if granularity != "decoder":
raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.")

# HuggingFace transformers pattern: model.model.layers
if hasattr(model, "model") and hasattr(model.model, "layers"):
return model.model.layers

# Megatron/MCore pattern: model.decoder.layers
if hasattr(model, "decoder") and hasattr(model.decoder, "layers"):
return model.decoder.layers

# Direct layers attribute (some models)
if hasattr(model, "layers") and isinstance(model.layers, nn.ModuleList):
return model.layers

# GPT-style: model.transformer.h
if hasattr(model, "transformer") and hasattr(model.transformer, "h"):
return model.transformer.h

# Nemotron Super/Nano
if hasattr(model, "backbone") and hasattr(model.backbone, "layers"):
return model.backbone.layers

print("No decoder layers found for model, returning None")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if unwrap_model exists in the network.py file
head -n 650 modelopt/torch/utils/network.py | tail -n 50

Repository: NVIDIA/Model-Optimizer

Length of output: 2127


🏁 Script executed:

# Search for unwrap_model function definition in the module
rg "def unwrap_model" modelopt/torch/utils/network.py -A 5

Repository: NVIDIA/Model-Optimizer

Length of output: 211


🏁 Script executed:

# Check the full context of the network.py file to understand the module structure
wc -l modelopt/torch/utils/network.py

Repository: NVIDIA/Model-Optimizer

Length of output: 101


🏁 Script executed:

# Look for imports and unwrap_model usage patterns
rg "unwrap_model" modelopt/torch/utils/network.py

Repository: NVIDIA/Model-Optimizer

Length of output: 103


🏁 Script executed:

# Get the full implementation of unwrap_model
rg "def unwrap_model" modelopt/torch/utils/network.py -A 30

Repository: NVIDIA/Model-Optimizer

Length of output: 1221


🏁 Script executed:

# Search for usages of get_decoder_layers in the codebase
rg "get_decoder_layers" --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 354


🏁 Script executed:

# Check for wrapped model patterns in the codebase
rg "DataParallel|DistributedDataParallel|FSDP|deepcopy_to_device" modelopt/torch/utils/network.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 1612


🏁 Script executed:

# Check if there are any tests or examples showing how get_decoder_layers is used
rg "get_decoder_layers" -B 3 -A 3

Repository: NVIDIA/Model-Optimizer

Length of output: 1677


🏁 Script executed:

# Check the context of get_decoder_layers call in model_calib.py
rg "transformer_layers = get_decoder_layers" modelopt/torch/quantization/model_calib.py -B 10 -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 715


🏁 Script executed:

# Check if the model is unwrapped elsewhere before being used
rg "unwrap_model" modelopt/torch/quantization/model_calib.py -B 3 -A 3

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Look for tests or examples that might demonstrate wrapped models
find . -name "*test*" -type f | grep -E "(calib|network)" | head -5

Repository: NVIDIA/Model-Optimizer

Length of output: 371


🏁 Script executed:

# Check the calibration test file to see if wrapped models are tested
cat modelopt/torch/utils/network.py | grep -A 5 "SUPPORTED_WRAPPERS"

Repository: NVIDIA/Model-Optimizer

Length of output: 1329


🏁 Script executed:

# Check if there are any tests that use wrapped models with sequential calibration
rg "sequential_calibrate\|DistributedDataParallel\|DataParallel" tests/unit/torch/quantization/test_calib.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Check if DeepSpeed wrappers are in SUPPORTED_WRAPPERS
rg "SUPPORTED_WRAPPERS\|DeepSpeed" modelopt/torch/utils/network.py -B 2 -A 5

Repository: NVIDIA/Model-Optimizer

Length of output: 48


Unwrap wrapped models before locating decoder layers.

get_decoder_layers only inspects attributes on the passed module. For DataParallel, DistributedDataParallel, FSDP, or DeepSpeed wrapped models, decoder blocks sit under model.module, causing the function to return None and sequential calibration to fail. Unwrap first using the existing unwrap_model(model, force_unwrap=True) available in this module.

Suggested fix
 def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None:
     """Get the decoder layers from a model for sequential calibration.
@@ -646,6 +646,8 @@ def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.Mo
     if granularity != "decoder":
         raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.")
 
+    # Unwrap common parallel wrappers (DDP/FSDP/DeepSpeed) to access actual layers.
+    model = unwrap_model(model, force_unwrap=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None:
"""Get the decoder layers from a model for sequential calibration.
Args:
model: The model to extract decoder layers from.
granularity: The type of layers to extract. Currently only "decoder" is supported.
Returns:
A ModuleList of decoder layers, or None if not found.
"""
if granularity != "decoder":
raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.")
# HuggingFace transformers pattern: model.model.layers
if hasattr(model, "model") and hasattr(model.model, "layers"):
return model.model.layers
# Megatron/MCore pattern: model.decoder.layers
if hasattr(model, "decoder") and hasattr(model.decoder, "layers"):
return model.decoder.layers
# Direct layers attribute (some models)
if hasattr(model, "layers") and isinstance(model.layers, nn.ModuleList):
return model.layers
# GPT-style: model.transformer.h
if hasattr(model, "transformer") and hasattr(model.transformer, "h"):
return model.transformer.h
# Nemotron Super/Nano
if hasattr(model, "backbone") and hasattr(model.backbone, "layers"):
return model.backbone.layers
print("No decoder layers found for model, returning None")
return None
def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None:
"""Get the decoder layers from a model for sequential calibration.
Args:
model: The model to extract decoder layers from.
granularity: The type of layers to extract. Currently only "decoder" is supported.
Returns:
A ModuleList of decoder layers, or None if not found.
"""
if granularity != "decoder":
raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.")
# Unwrap common parallel wrappers (DDP/FSDP/DeepSpeed) to access actual layers.
model = unwrap_model(model, force_unwrap=True)
# HuggingFace transformers pattern: model.model.layers
if hasattr(model, "model") and hasattr(model.model, "layers"):
return model.model.layers
# Megatron/MCore pattern: model.decoder.layers
if hasattr(model, "decoder") and hasattr(model.decoder, "layers"):
return model.decoder.layers
# Direct layers attribute (some models)
if hasattr(model, "layers") and isinstance(model.layers, nn.ModuleList):
return model.layers
# GPT-style: model.transformer.h
if hasattr(model, "transformer") and hasattr(model.transformer, "h"):
return model.transformer.h
# Nemotron Super/Nano
if hasattr(model, "backbone") and hasattr(model.backbone, "layers"):
return model.backbone.layers
print("No decoder layers found for model, returning None")
return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/utils/network.py` around lines 639 - 673, get_decoder_layers
currently inspects attributes on the passed module and misses wrapped models
(DataParallel/FSDP/DeepSpeed), so first call unwrap_model(model,
force_unwrap=True) and reassign the result to model at the start of
get_decoder_layers; then proceed to check the usual attributes
(model.model.layers, model.decoder.layers, model.layers, model.transformer.h,
model.backbone.layers) on the unwrapped model to correctly locate and return the
decoder ModuleList or None.

Signed-off-by: Suguna Velury <[email protected]>
Signed-off-by: Suguna Velury <[email protected]>
@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 27.77778% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.96%. Comparing base (52e662d) to head (4e59790).

Files with missing lines Patch % Lines
modelopt/torch/quantization/utils.py 28.57% 25 Missing ⚠️
modelopt/torch/utils/network.py 7.14% 13 Missing ⚠️
modelopt/torch/quantization/model_calib.py 29.41% 12 Missing ⚠️
modelopt/torch/quantization/mode.py 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #924      +/-   ##
==========================================
- Coverage   73.10%   72.96%   -0.14%     
==========================================
  Files         205      205              
  Lines       22294    22363      +69     
==========================================
+ Hits        16297    16317      +20     
- Misses       5997     6046      +49     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant