Support decoder block-level sequential calibration by sugunav14 · Pull Request #924 · NVIDIA/Model-Optimizer

sugunav14 · 2026-02-24T01:07:46Z

What does this PR do?

Type of change: New feature

Overview: Add support for sequential calibration of layers (at decoder level granularity) in ModelOpt.

Calibration flow

Get list of decoder blocks
For current block call get input activations (considering weight and activation QDQ from all other previous blocks) and call specified calibration function.

functions added

get_decoder_layers() -> to detect and get list of blocks to iterate over
LayerActivationCollector class -> to get input activations to the layer
sequential_calibrate() -> to perform the described calibration flow
use_sequential field in QuantizeAlgorithmConfig

Usage

# Sample config
NVFP4_DEFAULT_CFG = {
    "quant_cfg": {
        "*weight_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "dynamic", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        "*input_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "dynamic", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        **_default_disabled_quantizer_cfg,
    },
    "algorithm": {
           "method": "max",
           "use_sequential": True,
}

Set use_sequential=True in QUANT_CFG's "algorithm" section.

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

New Features
- Sequential layer-by-layer calibration: Quantization now supports processing decoder layers sequentially to improve memory efficiency on large models.

Signed-off-by: Suguna Velury <[email protected]>

copy-pr-bot · 2026-02-24T01:07:49Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-24T01:07:55Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Added sequential layer-by-layer calibration functionality to quantization pipeline. Introduces a configuration flag to enable this mode, implements the calibration orchestration logic, defines sequential calibration operations, and provides utilities for layer extraction and activation collection.

Changes

Cohort / File(s)	Summary
Configuration & Orchestration `modelopt/torch/quantization/config.py`, `modelopt/torch/quantization/mode.py`	Added `use_sequential` boolean field to `QuantizeAlgorithmConfig`. Updated mode.py to conditionally route to `sequential_calibrate` when flag is enabled, with backward compatibility for existing direct function calls.
Sequential Calibration Implementation `modelopt/torch/quantization/model_calib.py`	Implemented `sequential_calibrate()` function that performs layer-by-layer calibration on transformer decoder layers by collecting per-layer activations and invoking calibration logic per layer.
Activation & Layer Utilities `modelopt/torch/quantization/utils.py`, `modelopt/torch/utils/network.py`	Added `LayerActivationCollector` class to capture layer inputs via forward patching, introduced `_EarlyStopForwardError` exception for control flow, and implemented `get_decoder_layers()` utility to extract decoder layers from various model architectures.

Sequence Diagram

sequenceDiagram
    actor User
    participant Config as QuantizeAlgorithmConfig
    participant Mode as mode.py<br/>(Orchestration)
    participant ModelCalib as sequential_calibrate
    participant Collector as LayerActivationCollector
    participant Network as get_decoder_layers
    participant Model as Model

    User->>Config: Create config with<br/>use_sequential=True
    User->>Mode: Call with config
    Mode->>Mode: Check use_sequential flag
    Mode->>ModelCalib: Call sequential_calibrate()
    ModelCalib->>Network: get_decoder_layers(model)
    Network-->>ModelCalib: Return decoder layers
    loop For each layer
        ModelCalib->>Collector: Initialize collector<br/>for layer
        Collector->>Model: Patch layer forward
        Collector->>Model: Run forward pass
        Collector-->>ModelCalib: Collect layer inputs
        Collector->>Model: Unpatch layer
        ModelCalib->>ModelCalib: Call calib_func<br/>on layer inputs
    end
    ModelCalib-->>Mode: Calibration complete
    Mode-->>User: Return result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main feature introduced: sequential calibration at the decoder block level. It is concise, specific, and directly reflects the primary changes across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch svelury/sequential-calibrate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Suguna Velury <[email protected]>

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/mode.py`:
- Around line 225-243: When use_sequential (sequential) is enabled, validate
that forward_loop is provided and callable before calling sequential_calibrate;
if forward_loop is None or not callable raise a clear ValueError explaining that
sequential calibration requires a callable forward_loop. Update the branch where
sequential is True (around the sequential_calibrate call in mode.py) to perform
this check and raise the explicit error instead of letting sequential_calibrate
fail later.

In `@modelopt/torch/quantization/model_calib.py`:
- Around line 1836-1867: The sequential_calibrate function calls calib_func with
inputs as a second positional argument which collides with calibrator signatures
(causing TypeError); change the call in sequential_calibrate to pass only
forward_loop as the positional arg and supply the activations via a named
keyword (e.g., inputs=inputs) if the calibrator expects them; locate the call to
calib_func in sequential_calibrate (and the local _layer_forward_loop which uses
get_input_activations from LayerActivationCollector) and replace
calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs) with
a keyword-argument style call (for example calib_func(layer,
forward_loop=_layer_forward_loop, inputs=inputs, **calib_kwargs)) so no
positional collision occurs, then keep the existing cleanup (del inputs;
torch.cuda.empty_cache()).

In `@modelopt/torch/quantization/utils.py`:
- Around line 816-872: The patched layer forward (_forward_w_data_collection
inside _patch_and_initialize_layer) currently only appends inputs and never
calls the original forward, so when stop_after_collection is False the layer
returns None and breaks the model; modify _forward_w_data_collection to, after
appending to self.inputs, call and return the original forward (e.g. call
self._original_forward(*args, **kwargs) if present) when stop_after_collection
is False (and retain the early raise when True), ensuring you reference
bind_forward_method/_original_forward so the original method is invoked
correctly.

In `@modelopt/torch/utils/network.py`:
- Around line 639-673: get_decoder_layers currently inspects attributes on the
passed module and misses wrapped models (DataParallel/FSDP/DeepSpeed), so first
call unwrap_model(model, force_unwrap=True) and reassign the result to model at
the start of get_decoder_layers; then proceed to check the usual attributes
(model.model.layers, model.decoder.layers, model.layers, model.transformer.h,
model.backbone.layers) on the unwrapped model to correctly locate and return the
decoder ModuleList or None.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52e662d and a938963.

📒 Files selected for processing (5)

modelopt/torch/quantization/config.py
modelopt/torch/quantization/mode.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/utils.py
modelopt/torch/utils/network.py

coderabbitai · 2026-02-24T01:58:08Z

modelopt/torch/quantization/mode.py

+    sequential = kwargs.pop("use_sequential", False)
    if method is not None and "awq" in method:
        # For backward compatibility
        kwargs["algorithm"] = method

    if func is not None:
-        # Call the function with forward_loop as a separate argument
-        func(model, forward_loop=forward_loop, **kwargs)
+        if sequential:
+            # Wrap with sequential processing
+            sequential_calibrate(
+                model,
+                forward_loop=forward_loop,
+                calib_func=func,
+                **kwargs,
+            )
+        else:
+            # Direct calibration (existing behavior)
+            func(model, forward_loop=forward_loop, **kwargs)
+    else:
+        raise ValueError(f"No calibration function provided for method: {method}")


⚠️ Potential issue | 🟡 Minor

Validate forward_loop when use_sequential is enabled.

sequential_calibrate assumes a callable forward_loop; if it's None, the error shows up later and is harder to diagnose. Add an explicit check and clear message before calling.

💡 Suggested fix

if func is not None: if sequential: + if forward_loop is None: + raise ValueError("forward_loop must be provided when use_sequential=True") # Wrap with sequential processing sequential_calibrate( model, forward_loop=forward_loop, calib_func=func, **kwargs, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sequential = kwargs.pop("use_sequential", False)

if method is not None and "awq" in method:

# For backward compatibility

kwargs["algorithm"] = method

if func is not None:

# Call the function with forward_loop as a separate argument

func(model, forward_loop=forward_loop, **kwargs)

if sequential:

# Wrap with sequential processing

sequential_calibrate(

model,

forward_loop=forward_loop,

calib_func=func,

**kwargs,

)

else:

# Direct calibration (existing behavior)

func(model, forward_loop=forward_loop, **kwargs)

else:

raise ValueError(f"No calibration function provided for method: {method}")

sequential = kwargs.pop("use_sequential", False)

if method is not None and "awq" in method:

# For backward compatibility

kwargs["algorithm"] = method

if func is not None:

if sequential:

if forward_loop is None:

raise ValueError("forward_loop must be provided when use_sequential=True")

# Wrap with sequential processing

sequential_calibrate(

model,

forward_loop=forward_loop,

calib_func=func,

**kwargs,

)

else:

# Direct calibration (existing behavior)

func(model, forward_loop=forward_loop, **kwargs)

else:

raise ValueError(f"No calibration function provided for method: {method}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/mode.py` around lines 225 - 243, When use_sequential (sequential) is enabled, validate that forward_loop is provided and callable before calling sequential_calibrate; if forward_loop is None or not callable raise a clear ValueError explaining that sequential calibration requires a callable forward_loop. Update the branch where sequential is True (around the sequential_calibrate call in mode.py) to perform this check and raise the explicit error instead of letting sequential_calibrate fail later.

coderabbitai · 2026-02-24T01:58:08Z

modelopt/torch/quantization/model_calib.py

+@torch.no_grad()
+def sequential_calibrate(
+    model: nn.Module,
+    forward_loop: ForwardLoop,
+    calib_func: Callable,
+    **calib_kwargs,
+):
+    """Sequential calibration - a sequential layer-by-layer calibration algorithm."""
+    transformer_layers = get_decoder_layers(model)
+    if transformer_layers is None:
+        raise ValueError(
+            "Could not find transformer layers in model'. "
+            "Sequential calibration requires a model with identifiable transformer layers."
+        )
+
+    print_rank_0(f"Sequential calibration: Found {len(transformer_layers)} transformer layers")
+
+    gettr = LayerActivationCollector(model)
+
+    for _, layer in enumerate(transformer_layers):
+        # Get updated input activations to the current layer
+        inputs = gettr.get_input_activations(layer, forward_loop)
+
+        # Define a forward loop for the current layer
+        def _layer_forward_loop(m):
+            for args, kwargs_input in inputs:  # noqa: F821
+                m(*args, **kwargs_input)
+
+        # Call GPTQ
+        calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs)
+        del inputs
+        torch.cuda.empty_cache()


⚠️ Potential issue | 🔴 Critical

Fix sequential_calibrate invoking the calibrator with incorrect positional args.

calib_func(layer, inputs, forward_loop=...) passes inputs as the second positional argument, which collides with forward_loop in existing calibrators and will raise TypeError (or treat a list as a callable). Call the calibrator with forward_loop only and pass inputs via a named kwarg if needed.

🐛 Proposed fix

- calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs) + calib_func(layer, forward_loop=_layer_forward_loop, **calib_kwargs)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/model_calib.py` around lines 1836 - 1867, The sequential_calibrate function calls calib_func with inputs as a second positional argument which collides with calibrator signatures (causing TypeError); change the call in sequential_calibrate to pass only forward_loop as the positional arg and supply the activations via a named keyword (e.g., inputs=inputs) if the calibrator expects them; locate the call to calib_func in sequential_calibrate (and the local _layer_forward_loop which uses get_input_activations from LayerActivationCollector) and replace calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs) with a keyword-argument style call (for example calib_func(layer, forward_loop=_layer_forward_loop, inputs=inputs, **calib_kwargs)) so no positional collision occurs, then keep the existing cleanup (del inputs; torch.cuda.empty_cache()).

coderabbitai · 2026-02-24T01:58:08Z

modelopt/torch/quantization/utils.py

+class _EarlyStopForwardError(Exception):
+    """Error to stop the forward pass after collection."""
+
+
+class LayerActivationCollector:
+    """Helper class for collecting layer activations during forward passes.
+
+    This class allows for sequential layer calibration by
+    patching layers to capture inputs/outputs during forward passes
+    """
+
+    def __init__(self, model: nn.Module):
+        self.model = model
+
+    @staticmethod
+    def _patch_and_initialize_layer(layer: torch.nn.Module, stop_after_collection: bool = False):
+        """Patch a layer to collect inputs during forward passes."""
+
+        def _forward_w_data_collection(self, *args, **kwargs):
+            # Note: 'self' refers to the patched layer.
+            assert len(args) >= 1, (
+                f"Expected at least 1 positional arg, got {len(args)} args and {list(kwargs.keys())} kwargs"
+            )
+            # Only collect the inputs to the layer
+            self.inputs.append((args, kwargs))
+            if stop_after_collection:
+                raise _EarlyStopForwardError()  # Stop the forward pass after collection
+
+        bind_forward_method(layer, _forward_w_data_collection, "_original_forward")
+        layer.inputs = []
+
+    @staticmethod
+    def _unpatch_and_cleanup_layer(layer: torch.nn.Module):
+        if hasattr(layer, "_original_forward"):
+            unpatch_forward_method(layer, "_original_forward")
+        if hasattr(layer, "inputs"):
+            del layer.inputs
+
+    @torch.no_grad()
+    def get_input_activations(self, layer: torch.nn.Module, forward_loop: ForwardLoop) -> list:
+        # Wrap model forward to catch _EarlyStopForward per-batch
+        def _early_stop_forward(self, *args, **kwargs):
+            try:
+                return self._original_forward(*args, **kwargs)
+            except _EarlyStopForwardError:
+                return None  # Stop propagation but allow next batch
+
+        try:
+            bind_forward_method(self.model, _early_stop_forward, "_original_forward")
+            self._patch_and_initialize_layer(layer, stop_after_collection=True)
+            forward_loop(self.model)
+            inputs = layer.inputs.copy()
+        finally:
+            self._unpatch_and_cleanup_layer(layer)
+            unpatch_forward_method(self.model, "_original_forward")
+
+        return inputs


⚠️ Potential issue | 🟡 Minor

Preserve the original forward when not early-stopping.

_forward_w_data_collection never calls the original forward, so stop_after_collection=False makes the patched layer return None and breaks downstream execution. Either enforce early-stop or forward to _original_forward.

🐛 Proposed fix

def _forward_w_data_collection(self, *args, **kwargs): # Note: 'self' refers to the patched layer. assert len(args) >= 1, ( f"Expected at least 1 positional arg, got {len(args)} args and {list(kwargs.keys())} kwargs" ) # Only collect the inputs to the layer self.inputs.append((args, kwargs)) if stop_after_collection: raise _EarlyStopForwardError() # Stop the forward pass after collection + return self._original_forward(*args, **kwargs)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/utils.py` around lines 816 - 872, The patched layer forward (_forward_w_data_collection inside _patch_and_initialize_layer) currently only appends inputs and never calls the original forward, so when stop_after_collection is False the layer returns None and breaks the model; modify _forward_w_data_collection to, after appending to self.inputs, call and return the original forward (e.g. call self._original_forward(*args, **kwargs) if present) when stop_after_collection is False (and retain the early raise when True), ensuring you reference bind_forward_method/_original_forward so the original method is invoked correctly.

coderabbitai · 2026-02-24T01:58:08Z

modelopt/torch/utils/network.py

+def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None:
+    """Get the decoder layers from a model for sequential calibration.
+
+    Args:
+        model: The model to extract decoder layers from.
+        granularity: The type of layers to extract. Currently only "decoder" is supported.
+
+    Returns:
+        A ModuleList of decoder layers, or None if not found.
+    """
+    if granularity != "decoder":
+        raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.")
+
+    # HuggingFace transformers pattern: model.model.layers
+    if hasattr(model, "model") and hasattr(model.model, "layers"):
+        return model.model.layers
+
+    # Megatron/MCore pattern: model.decoder.layers
+    if hasattr(model, "decoder") and hasattr(model.decoder, "layers"):
+        return model.decoder.layers
+
+    # Direct layers attribute (some models)
+    if hasattr(model, "layers") and isinstance(model.layers, nn.ModuleList):
+        return model.layers
+
+    # GPT-style: model.transformer.h
+    if hasattr(model, "transformer") and hasattr(model.transformer, "h"):
+        return model.transformer.h
+
+    # Nemotron Super/Nano
+    if hasattr(model, "backbone") and hasattr(model.backbone, "layers"):
+        return model.backbone.layers
+
+    print("No decoder layers found for model, returning None")
+    return None


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if unwrap_model exists in the network.py file head -n 650 modelopt/torch/utils/network.py | tail -n 50

Repository: NVIDIA/Model-Optimizer

Length of output: 2127

🏁 Script executed:

# Search for unwrap_model function definition in the module rg "def unwrap_model" modelopt/torch/utils/network.py -A 5

Repository: NVIDIA/Model-Optimizer

Length of output: 211

🏁 Script executed:

# Check the full context of the network.py file to understand the module structure wc -l modelopt/torch/utils/network.py

Repository: NVIDIA/Model-Optimizer

Length of output: 101

🏁 Script executed:

# Look for imports and unwrap_model usage patterns rg "unwrap_model" modelopt/torch/utils/network.py

Repository: NVIDIA/Model-Optimizer

Length of output: 103

🏁 Script executed:

# Get the full implementation of unwrap_model rg "def unwrap_model" modelopt/torch/utils/network.py -A 30

Repository: NVIDIA/Model-Optimizer

Length of output: 1221

🏁 Script executed:

# Search for usages of get_decoder_layers in the codebase rg "get_decoder_layers" --type py

Repository: NVIDIA/Model-Optimizer

Length of output: 354

🏁 Script executed:

# Check for wrapped model patterns in the codebase rg "DataParallel|DistributedDataParallel|FSDP|deepcopy_to_device" modelopt/torch/utils/network.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 1612

🏁 Script executed:

# Check if there are any tests or examples showing how get_decoder_layers is used rg "get_decoder_layers" -B 3 -A 3

Repository: NVIDIA/Model-Optimizer

Length of output: 1677

🏁 Script executed:

# Check the context of get_decoder_layers call in model_calib.py rg "transformer_layers = get_decoder_layers" modelopt/torch/quantization/model_calib.py -B 10 -A 10

Repository: NVIDIA/Model-Optimizer

Length of output: 715

🏁 Script executed:

# Check if the model is unwrapped elsewhere before being used rg "unwrap_model" modelopt/torch/quantization/model_calib.py -B 3 -A 3

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Look for tests or examples that might demonstrate wrapped models find . -name "*test*" -type f | grep -E "(calib|network)" | head -5

Repository: NVIDIA/Model-Optimizer

Length of output: 371

🏁 Script executed:

# Check the calibration test file to see if wrapped models are tested cat modelopt/torch/utils/network.py | grep -A 5 "SUPPORTED_WRAPPERS"

Repository: NVIDIA/Model-Optimizer

Length of output: 1329

🏁 Script executed:

# Check if there are any tests that use wrapped models with sequential calibration rg "sequential_calibrate\|DistributedDataParallel\|DataParallel" tests/unit/torch/quantization/test_calib.py -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 48

🏁 Script executed:

# Check if DeepSpeed wrappers are in SUPPORTED_WRAPPERS rg "SUPPORTED_WRAPPERS\|DeepSpeed" modelopt/torch/utils/network.py -B 2 -A 5

Repository: NVIDIA/Model-Optimizer

Length of output: 48

Unwrap wrapped models before locating decoder layers.

get_decoder_layers only inspects attributes on the passed module. For DataParallel, DistributedDataParallel, FSDP, or DeepSpeed wrapped models, decoder blocks sit under model.module, causing the function to return None and sequential calibration to fail. Unwrap first using the existing unwrap_model(model, force_unwrap=True) available in this module.

Suggested fix

def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None: """Get the decoder layers from a model for sequential calibration. @@ -646,6 +646,8 @@ def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.Mo if granularity != "decoder": raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.") + # Unwrap common parallel wrappers (DDP/FSDP/DeepSpeed) to access actual layers. + model = unwrap_model(model, force_unwrap=True)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None:

"""Get the decoder layers from a model for sequential calibration.

Args:

model: The model to extract decoder layers from.

granularity: The type of layers to extract. Currently only "decoder" is supported.

Returns:

A ModuleList of decoder layers, or None if not found.

"""

if granularity != "decoder":

raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.")

# HuggingFace transformers pattern: model.model.layers

if hasattr(model, "model") and hasattr(model.model, "layers"):

return model.model.layers

# Megatron/MCore pattern: model.decoder.layers

if hasattr(model, "decoder") and hasattr(model.decoder, "layers"):

return model.decoder.layers

# Direct layers attribute (some models)

if hasattr(model, "layers") and isinstance(model.layers, nn.ModuleList):

return model.layers

# GPT-style: model.transformer.h

if hasattr(model, "transformer") and hasattr(model.transformer, "h"):

return model.transformer.h

# Nemotron Super/Nano

if hasattr(model, "backbone") and hasattr(model.backbone, "layers"):

return model.backbone.layers

print("No decoder layers found for model, returning None")

return None

def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None:

"""Get the decoder layers from a model for sequential calibration.

Args:

model: The model to extract decoder layers from.

granularity: The type of layers to extract. Currently only "decoder" is supported.

Returns:

A ModuleList of decoder layers, or None if not found.

"""

if granularity != "decoder":

raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.")

# Unwrap common parallel wrappers (DDP/FSDP/DeepSpeed) to access actual layers.

model = unwrap_model(model, force_unwrap=True)

# HuggingFace transformers pattern: model.model.layers

if hasattr(model, "model") and hasattr(model.model, "layers"):

return model.model.layers

# Megatron/MCore pattern: model.decoder.layers

if hasattr(model, "decoder") and hasattr(model.decoder, "layers"):

return model.decoder.layers

# Direct layers attribute (some models)

if hasattr(model, "layers") and isinstance(model.layers, nn.ModuleList):

return model.layers

# GPT-style: model.transformer.h

if hasattr(model, "transformer") and hasattr(model.transformer, "h"):

return model.transformer.h

# Nemotron Super/Nano

if hasattr(model, "backbone") and hasattr(model.backbone, "layers"):

return model.backbone.layers

print("No decoder layers found for model, returning None")

return None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/utils/network.py` around lines 639 - 673, get_decoder_layers currently inspects attributes on the passed module and misses wrapped models (DataParallel/FSDP/DeepSpeed), so first call unwrap_model(model, force_unwrap=True) and reassign the result to model at the start of get_decoder_layers; then proceed to check the usual attributes (model.model.layers, model.decoder.layers, model.layers, model.transformer.h, model.backbone.layers) on the unwrapped model to correctly locate and return the decoder ModuleList or None.

Signed-off-by: Suguna Velury <[email protected]>

codecov · 2026-02-24T03:10:19Z

Codecov Report

❌ Patch coverage is 27.77778% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.96%. Comparing base (52e662d) to head (4e59790).

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/utils.py	28.57%	25 Missing ⚠️
modelopt/torch/utils/network.py	7.14%	13 Missing ⚠️
modelopt/torch/quantization/model_calib.py	29.41%	12 Missing ⚠️
modelopt/torch/quantization/mode.py	60.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #924      +/-   ##
==========================================
- Coverage   73.10%   72.96%   -0.14%     
==========================================
  Files         205      205              
  Lines       22294    22363      +69     
==========================================
+ Hits        16297    16317      +20     
- Misses       5997     6046      +49

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sequential flow

6578dff

Signed-off-by: Suguna Velury <[email protected]>

sugunav14 requested review from Fridah-nv, cjluo-nv, meenchen and realAsma February 24, 2026 01:16

clean up

a938963

Signed-off-by: Suguna Velury <[email protected]>

sugunav14 marked this pull request as ready for review February 24, 2026 01:49

sugunav14 requested review from a team as code owners February 24, 2026 01:49

added calib_func check

4d1c1ff

Signed-off-by: Suguna Velury <[email protected]>

coderabbitai bot reviewed Feb 24, 2026

View reviewed changes

sugunav14 added 2 commits February 24, 2026 02:00

update

fc6ba07

Signed-off-by: Suguna Velury <[email protected]>

updated stale comment

4e59790

Signed-off-by: Suguna Velury <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Support decoder block-level sequential calibration#924

Support decoder block-level sequential calibration#924
sugunav14 wants to merge 5 commits intomainfrom
svelury/sequential-calibrate

sugunav14 commented Feb 24, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 24, 2026

Uh oh!

coderabbitai bot commented Feb 24, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 24, 2026

Uh oh!

coderabbitai bot Feb 24, 2026

Uh oh!

coderabbitai bot Feb 24, 2026

Uh oh!

coderabbitai bot Feb 24, 2026

Uh oh!

codecov bot commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

sugunav14 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Feb 24, 2026

Uh oh!

coderabbitai bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 24, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sugunav14 commented Feb 24, 2026 •

edited

Loading

coderabbitai bot commented Feb 24, 2026 •

edited

Loading