Support decoder block-level sequential calibration#924
Support decoder block-level sequential calibration#924
Conversation
Signed-off-by: Suguna Velury <[email protected]>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdded sequential layer-by-layer calibration functionality to quantization pipeline. Introduces a configuration flag to enable this mode, implements the calibration orchestration logic, defines sequential calibration operations, and provides utilities for layer extraction and activation collection. Changes
Sequence DiagramsequenceDiagram
actor User
participant Config as QuantizeAlgorithmConfig
participant Mode as mode.py<br/>(Orchestration)
participant ModelCalib as sequential_calibrate
participant Collector as LayerActivationCollector
participant Network as get_decoder_layers
participant Model as Model
User->>Config: Create config with<br/>use_sequential=True
User->>Mode: Call with config
Mode->>Mode: Check use_sequential flag
Mode->>ModelCalib: Call sequential_calibrate()
ModelCalib->>Network: get_decoder_layers(model)
Network-->>ModelCalib: Return decoder layers
loop For each layer
ModelCalib->>Collector: Initialize collector<br/>for layer
Collector->>Model: Patch layer forward
Collector->>Model: Run forward pass
Collector-->>ModelCalib: Collect layer inputs
Collector->>Model: Unpatch layer
ModelCalib->>ModelCalib: Call calib_func<br/>on layer inputs
end
ModelCalib-->>Mode: Calibration complete
Mode-->>User: Return result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Suguna Velury <[email protected]>
Signed-off-by: Suguna Velury <[email protected]>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/quantization/mode.py`:
- Around line 225-243: When use_sequential (sequential) is enabled, validate
that forward_loop is provided and callable before calling sequential_calibrate;
if forward_loop is None or not callable raise a clear ValueError explaining that
sequential calibration requires a callable forward_loop. Update the branch where
sequential is True (around the sequential_calibrate call in mode.py) to perform
this check and raise the explicit error instead of letting sequential_calibrate
fail later.
In `@modelopt/torch/quantization/model_calib.py`:
- Around line 1836-1867: The sequential_calibrate function calls calib_func with
inputs as a second positional argument which collides with calibrator signatures
(causing TypeError); change the call in sequential_calibrate to pass only
forward_loop as the positional arg and supply the activations via a named
keyword (e.g., inputs=inputs) if the calibrator expects them; locate the call to
calib_func in sequential_calibrate (and the local _layer_forward_loop which uses
get_input_activations from LayerActivationCollector) and replace
calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs) with
a keyword-argument style call (for example calib_func(layer,
forward_loop=_layer_forward_loop, inputs=inputs, **calib_kwargs)) so no
positional collision occurs, then keep the existing cleanup (del inputs;
torch.cuda.empty_cache()).
In `@modelopt/torch/quantization/utils.py`:
- Around line 816-872: The patched layer forward (_forward_w_data_collection
inside _patch_and_initialize_layer) currently only appends inputs and never
calls the original forward, so when stop_after_collection is False the layer
returns None and breaks the model; modify _forward_w_data_collection to, after
appending to self.inputs, call and return the original forward (e.g. call
self._original_forward(*args, **kwargs) if present) when stop_after_collection
is False (and retain the early raise when True), ensuring you reference
bind_forward_method/_original_forward so the original method is invoked
correctly.
In `@modelopt/torch/utils/network.py`:
- Around line 639-673: get_decoder_layers currently inspects attributes on the
passed module and misses wrapped models (DataParallel/FSDP/DeepSpeed), so first
call unwrap_model(model, force_unwrap=True) and reassign the result to model at
the start of get_decoder_layers; then proceed to check the usual attributes
(model.model.layers, model.decoder.layers, model.layers, model.transformer.h,
model.backbone.layers) on the unwrapped model to correctly locate and return the
decoder ModuleList or None.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
modelopt/torch/quantization/config.pymodelopt/torch/quantization/mode.pymodelopt/torch/quantization/model_calib.pymodelopt/torch/quantization/utils.pymodelopt/torch/utils/network.py
| sequential = kwargs.pop("use_sequential", False) | ||
| if method is not None and "awq" in method: | ||
| # For backward compatibility | ||
| kwargs["algorithm"] = method | ||
|
|
||
| if func is not None: | ||
| # Call the function with forward_loop as a separate argument | ||
| func(model, forward_loop=forward_loop, **kwargs) | ||
| if sequential: | ||
| # Wrap with sequential processing | ||
| sequential_calibrate( | ||
| model, | ||
| forward_loop=forward_loop, | ||
| calib_func=func, | ||
| **kwargs, | ||
| ) | ||
| else: | ||
| # Direct calibration (existing behavior) | ||
| func(model, forward_loop=forward_loop, **kwargs) | ||
| else: | ||
| raise ValueError(f"No calibration function provided for method: {method}") |
There was a problem hiding this comment.
Validate forward_loop when use_sequential is enabled.
sequential_calibrate assumes a callable forward_loop; if it's None, the error shows up later and is harder to diagnose. Add an explicit check and clear message before calling.
💡 Suggested fix
if func is not None:
if sequential:
+ if forward_loop is None:
+ raise ValueError("forward_loop must be provided when use_sequential=True")
# Wrap with sequential processing
sequential_calibrate(
model,
forward_loop=forward_loop,
calib_func=func,
**kwargs,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sequential = kwargs.pop("use_sequential", False) | |
| if method is not None and "awq" in method: | |
| # For backward compatibility | |
| kwargs["algorithm"] = method | |
| if func is not None: | |
| # Call the function with forward_loop as a separate argument | |
| func(model, forward_loop=forward_loop, **kwargs) | |
| if sequential: | |
| # Wrap with sequential processing | |
| sequential_calibrate( | |
| model, | |
| forward_loop=forward_loop, | |
| calib_func=func, | |
| **kwargs, | |
| ) | |
| else: | |
| # Direct calibration (existing behavior) | |
| func(model, forward_loop=forward_loop, **kwargs) | |
| else: | |
| raise ValueError(f"No calibration function provided for method: {method}") | |
| sequential = kwargs.pop("use_sequential", False) | |
| if method is not None and "awq" in method: | |
| # For backward compatibility | |
| kwargs["algorithm"] = method | |
| if func is not None: | |
| if sequential: | |
| if forward_loop is None: | |
| raise ValueError("forward_loop must be provided when use_sequential=True") | |
| # Wrap with sequential processing | |
| sequential_calibrate( | |
| model, | |
| forward_loop=forward_loop, | |
| calib_func=func, | |
| **kwargs, | |
| ) | |
| else: | |
| # Direct calibration (existing behavior) | |
| func(model, forward_loop=forward_loop, **kwargs) | |
| else: | |
| raise ValueError(f"No calibration function provided for method: {method}") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/quantization/mode.py` around lines 225 - 243, When
use_sequential (sequential) is enabled, validate that forward_loop is provided
and callable before calling sequential_calibrate; if forward_loop is None or not
callable raise a clear ValueError explaining that sequential calibration
requires a callable forward_loop. Update the branch where sequential is True
(around the sequential_calibrate call in mode.py) to perform this check and
raise the explicit error instead of letting sequential_calibrate fail later.
| @torch.no_grad() | ||
| def sequential_calibrate( | ||
| model: nn.Module, | ||
| forward_loop: ForwardLoop, | ||
| calib_func: Callable, | ||
| **calib_kwargs, | ||
| ): | ||
| """Sequential calibration - a sequential layer-by-layer calibration algorithm.""" | ||
| transformer_layers = get_decoder_layers(model) | ||
| if transformer_layers is None: | ||
| raise ValueError( | ||
| "Could not find transformer layers in model'. " | ||
| "Sequential calibration requires a model with identifiable transformer layers." | ||
| ) | ||
|
|
||
| print_rank_0(f"Sequential calibration: Found {len(transformer_layers)} transformer layers") | ||
|
|
||
| gettr = LayerActivationCollector(model) | ||
|
|
||
| for _, layer in enumerate(transformer_layers): | ||
| # Get updated input activations to the current layer | ||
| inputs = gettr.get_input_activations(layer, forward_loop) | ||
|
|
||
| # Define a forward loop for the current layer | ||
| def _layer_forward_loop(m): | ||
| for args, kwargs_input in inputs: # noqa: F821 | ||
| m(*args, **kwargs_input) | ||
|
|
||
| # Call GPTQ | ||
| calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs) | ||
| del inputs | ||
| torch.cuda.empty_cache() |
There was a problem hiding this comment.
Fix sequential_calibrate invoking the calibrator with incorrect positional args.
calib_func(layer, inputs, forward_loop=...) passes inputs as the second positional argument, which collides with forward_loop in existing calibrators and will raise TypeError (or treat a list as a callable). Call the calibrator with forward_loop only and pass inputs via a named kwarg if needed.
🐛 Proposed fix
- calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs)
+ calib_func(layer, forward_loop=_layer_forward_loop, **calib_kwargs)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/quantization/model_calib.py` around lines 1836 - 1867, The
sequential_calibrate function calls calib_func with inputs as a second
positional argument which collides with calibrator signatures (causing
TypeError); change the call in sequential_calibrate to pass only forward_loop as
the positional arg and supply the activations via a named keyword (e.g.,
inputs=inputs) if the calibrator expects them; locate the call to calib_func in
sequential_calibrate (and the local _layer_forward_loop which uses
get_input_activations from LayerActivationCollector) and replace
calib_func(layer, inputs, forward_loop=_layer_forward_loop, **calib_kwargs) with
a keyword-argument style call (for example calib_func(layer,
forward_loop=_layer_forward_loop, inputs=inputs, **calib_kwargs)) so no
positional collision occurs, then keep the existing cleanup (del inputs;
torch.cuda.empty_cache()).
| class _EarlyStopForwardError(Exception): | ||
| """Error to stop the forward pass after collection.""" | ||
|
|
||
|
|
||
| class LayerActivationCollector: | ||
| """Helper class for collecting layer activations during forward passes. | ||
|
|
||
| This class allows for sequential layer calibration by | ||
| patching layers to capture inputs/outputs during forward passes | ||
| """ | ||
|
|
||
| def __init__(self, model: nn.Module): | ||
| self.model = model | ||
|
|
||
| @staticmethod | ||
| def _patch_and_initialize_layer(layer: torch.nn.Module, stop_after_collection: bool = False): | ||
| """Patch a layer to collect inputs during forward passes.""" | ||
|
|
||
| def _forward_w_data_collection(self, *args, **kwargs): | ||
| # Note: 'self' refers to the patched layer. | ||
| assert len(args) >= 1, ( | ||
| f"Expected at least 1 positional arg, got {len(args)} args and {list(kwargs.keys())} kwargs" | ||
| ) | ||
| # Only collect the inputs to the layer | ||
| self.inputs.append((args, kwargs)) | ||
| if stop_after_collection: | ||
| raise _EarlyStopForwardError() # Stop the forward pass after collection | ||
|
|
||
| bind_forward_method(layer, _forward_w_data_collection, "_original_forward") | ||
| layer.inputs = [] | ||
|
|
||
| @staticmethod | ||
| def _unpatch_and_cleanup_layer(layer: torch.nn.Module): | ||
| if hasattr(layer, "_original_forward"): | ||
| unpatch_forward_method(layer, "_original_forward") | ||
| if hasattr(layer, "inputs"): | ||
| del layer.inputs | ||
|
|
||
| @torch.no_grad() | ||
| def get_input_activations(self, layer: torch.nn.Module, forward_loop: ForwardLoop) -> list: | ||
| # Wrap model forward to catch _EarlyStopForward per-batch | ||
| def _early_stop_forward(self, *args, **kwargs): | ||
| try: | ||
| return self._original_forward(*args, **kwargs) | ||
| except _EarlyStopForwardError: | ||
| return None # Stop propagation but allow next batch | ||
|
|
||
| try: | ||
| bind_forward_method(self.model, _early_stop_forward, "_original_forward") | ||
| self._patch_and_initialize_layer(layer, stop_after_collection=True) | ||
| forward_loop(self.model) | ||
| inputs = layer.inputs.copy() | ||
| finally: | ||
| self._unpatch_and_cleanup_layer(layer) | ||
| unpatch_forward_method(self.model, "_original_forward") | ||
|
|
||
| return inputs |
There was a problem hiding this comment.
Preserve the original forward when not early-stopping.
_forward_w_data_collection never calls the original forward, so stop_after_collection=False makes the patched layer return None and breaks downstream execution. Either enforce early-stop or forward to _original_forward.
🐛 Proposed fix
def _forward_w_data_collection(self, *args, **kwargs):
# Note: 'self' refers to the patched layer.
assert len(args) >= 1, (
f"Expected at least 1 positional arg, got {len(args)} args and {list(kwargs.keys())} kwargs"
)
# Only collect the inputs to the layer
self.inputs.append((args, kwargs))
if stop_after_collection:
raise _EarlyStopForwardError() # Stop the forward pass after collection
+ return self._original_forward(*args, **kwargs)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/quantization/utils.py` around lines 816 - 872, The patched
layer forward (_forward_w_data_collection inside _patch_and_initialize_layer)
currently only appends inputs and never calls the original forward, so when
stop_after_collection is False the layer returns None and breaks the model;
modify _forward_w_data_collection to, after appending to self.inputs, call and
return the original forward (e.g. call self._original_forward(*args, **kwargs)
if present) when stop_after_collection is False (and retain the early raise when
True), ensuring you reference bind_forward_method/_original_forward so the
original method is invoked correctly.
| def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None: | ||
| """Get the decoder layers from a model for sequential calibration. | ||
|
|
||
| Args: | ||
| model: The model to extract decoder layers from. | ||
| granularity: The type of layers to extract. Currently only "decoder" is supported. | ||
|
|
||
| Returns: | ||
| A ModuleList of decoder layers, or None if not found. | ||
| """ | ||
| if granularity != "decoder": | ||
| raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.") | ||
|
|
||
| # HuggingFace transformers pattern: model.model.layers | ||
| if hasattr(model, "model") and hasattr(model.model, "layers"): | ||
| return model.model.layers | ||
|
|
||
| # Megatron/MCore pattern: model.decoder.layers | ||
| if hasattr(model, "decoder") and hasattr(model.decoder, "layers"): | ||
| return model.decoder.layers | ||
|
|
||
| # Direct layers attribute (some models) | ||
| if hasattr(model, "layers") and isinstance(model.layers, nn.ModuleList): | ||
| return model.layers | ||
|
|
||
| # GPT-style: model.transformer.h | ||
| if hasattr(model, "transformer") and hasattr(model.transformer, "h"): | ||
| return model.transformer.h | ||
|
|
||
| # Nemotron Super/Nano | ||
| if hasattr(model, "backbone") and hasattr(model.backbone, "layers"): | ||
| return model.backbone.layers | ||
|
|
||
| print("No decoder layers found for model, returning None") | ||
| return None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if unwrap_model exists in the network.py file
head -n 650 modelopt/torch/utils/network.py | tail -n 50Repository: NVIDIA/Model-Optimizer
Length of output: 2127
🏁 Script executed:
# Search for unwrap_model function definition in the module
rg "def unwrap_model" modelopt/torch/utils/network.py -A 5Repository: NVIDIA/Model-Optimizer
Length of output: 211
🏁 Script executed:
# Check the full context of the network.py file to understand the module structure
wc -l modelopt/torch/utils/network.pyRepository: NVIDIA/Model-Optimizer
Length of output: 101
🏁 Script executed:
# Look for imports and unwrap_model usage patterns
rg "unwrap_model" modelopt/torch/utils/network.pyRepository: NVIDIA/Model-Optimizer
Length of output: 103
🏁 Script executed:
# Get the full implementation of unwrap_model
rg "def unwrap_model" modelopt/torch/utils/network.py -A 30Repository: NVIDIA/Model-Optimizer
Length of output: 1221
🏁 Script executed:
# Search for usages of get_decoder_layers in the codebase
rg "get_decoder_layers" --type pyRepository: NVIDIA/Model-Optimizer
Length of output: 354
🏁 Script executed:
# Check for wrapped model patterns in the codebase
rg "DataParallel|DistributedDataParallel|FSDP|deepcopy_to_device" modelopt/torch/utils/network.py -B 2 -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 1612
🏁 Script executed:
# Check if there are any tests or examples showing how get_decoder_layers is used
rg "get_decoder_layers" -B 3 -A 3Repository: NVIDIA/Model-Optimizer
Length of output: 1677
🏁 Script executed:
# Check the context of get_decoder_layers call in model_calib.py
rg "transformer_layers = get_decoder_layers" modelopt/torch/quantization/model_calib.py -B 10 -A 10Repository: NVIDIA/Model-Optimizer
Length of output: 715
🏁 Script executed:
# Check if the model is unwrapped elsewhere before being used
rg "unwrap_model" modelopt/torch/quantization/model_calib.py -B 3 -A 3Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Look for tests or examples that might demonstrate wrapped models
find . -name "*test*" -type f | grep -E "(calib|network)" | head -5Repository: NVIDIA/Model-Optimizer
Length of output: 371
🏁 Script executed:
# Check the calibration test file to see if wrapped models are tested
cat modelopt/torch/utils/network.py | grep -A 5 "SUPPORTED_WRAPPERS"Repository: NVIDIA/Model-Optimizer
Length of output: 1329
🏁 Script executed:
# Check if there are any tests that use wrapped models with sequential calibration
rg "sequential_calibrate\|DistributedDataParallel\|DataParallel" tests/unit/torch/quantization/test_calib.py -B 2 -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check if DeepSpeed wrappers are in SUPPORTED_WRAPPERS
rg "SUPPORTED_WRAPPERS\|DeepSpeed" modelopt/torch/utils/network.py -B 2 -A 5Repository: NVIDIA/Model-Optimizer
Length of output: 48
Unwrap wrapped models before locating decoder layers.
get_decoder_layers only inspects attributes on the passed module. For DataParallel, DistributedDataParallel, FSDP, or DeepSpeed wrapped models, decoder blocks sit under model.module, causing the function to return None and sequential calibration to fail. Unwrap first using the existing unwrap_model(model, force_unwrap=True) available in this module.
Suggested fix
def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None:
"""Get the decoder layers from a model for sequential calibration.
@@ -646,6 +646,8 @@ def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.Mo
if granularity != "decoder":
raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.")
+ # Unwrap common parallel wrappers (DDP/FSDP/DeepSpeed) to access actual layers.
+ model = unwrap_model(model, force_unwrap=True)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None: | |
| """Get the decoder layers from a model for sequential calibration. | |
| Args: | |
| model: The model to extract decoder layers from. | |
| granularity: The type of layers to extract. Currently only "decoder" is supported. | |
| Returns: | |
| A ModuleList of decoder layers, or None if not found. | |
| """ | |
| if granularity != "decoder": | |
| raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.") | |
| # HuggingFace transformers pattern: model.model.layers | |
| if hasattr(model, "model") and hasattr(model.model, "layers"): | |
| return model.model.layers | |
| # Megatron/MCore pattern: model.decoder.layers | |
| if hasattr(model, "decoder") and hasattr(model.decoder, "layers"): | |
| return model.decoder.layers | |
| # Direct layers attribute (some models) | |
| if hasattr(model, "layers") and isinstance(model.layers, nn.ModuleList): | |
| return model.layers | |
| # GPT-style: model.transformer.h | |
| if hasattr(model, "transformer") and hasattr(model.transformer, "h"): | |
| return model.transformer.h | |
| # Nemotron Super/Nano | |
| if hasattr(model, "backbone") and hasattr(model.backbone, "layers"): | |
| return model.backbone.layers | |
| print("No decoder layers found for model, returning None") | |
| return None | |
| def get_decoder_layers(model: nn.Module, granularity: str = "decoder") -> nn.ModuleList | None: | |
| """Get the decoder layers from a model for sequential calibration. | |
| Args: | |
| model: The model to extract decoder layers from. | |
| granularity: The type of layers to extract. Currently only "decoder" is supported. | |
| Returns: | |
| A ModuleList of decoder layers, or None if not found. | |
| """ | |
| if granularity != "decoder": | |
| raise ValueError(f"Unsupported granularity: {granularity}. Only 'decoder' is supported.") | |
| # Unwrap common parallel wrappers (DDP/FSDP/DeepSpeed) to access actual layers. | |
| model = unwrap_model(model, force_unwrap=True) | |
| # HuggingFace transformers pattern: model.model.layers | |
| if hasattr(model, "model") and hasattr(model.model, "layers"): | |
| return model.model.layers | |
| # Megatron/MCore pattern: model.decoder.layers | |
| if hasattr(model, "decoder") and hasattr(model.decoder, "layers"): | |
| return model.decoder.layers | |
| # Direct layers attribute (some models) | |
| if hasattr(model, "layers") and isinstance(model.layers, nn.ModuleList): | |
| return model.layers | |
| # GPT-style: model.transformer.h | |
| if hasattr(model, "transformer") and hasattr(model.transformer, "h"): | |
| return model.transformer.h | |
| # Nemotron Super/Nano | |
| if hasattr(model, "backbone") and hasattr(model.backbone, "layers"): | |
| return model.backbone.layers | |
| print("No decoder layers found for model, returning None") | |
| return None |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/utils/network.py` around lines 639 - 673, get_decoder_layers
currently inspects attributes on the passed module and misses wrapped models
(DataParallel/FSDP/DeepSpeed), so first call unwrap_model(model,
force_unwrap=True) and reassign the result to model at the start of
get_decoder_layers; then proceed to check the usual attributes
(model.model.layers, model.decoder.layers, model.layers, model.transformer.h,
model.backbone.layers) on the unwrapped model to correctly locate and return the
decoder ModuleList or None.
Signed-off-by: Suguna Velury <[email protected]>
Signed-off-by: Suguna Velury <[email protected]>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #924 +/- ##
==========================================
- Coverage 73.10% 72.96% -0.14%
==========================================
Files 205 205
Lines 22294 22363 +69
==========================================
+ Hits 16297 16317 +20
- Misses 5997 6046 +49 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: New feature
Overview: Add support for sequential calibration of layers (at decoder level granularity) in ModelOpt.
Calibration flow
functions added
Usage
Set use_sequential=True in QUANT_CFG's "algorithm" section.
Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit