Skip to content

Releases: SonySemiconductorSolutions/mct-model-optimization

Release 2.5.0

01 Dec 04:58
480108f

Choose a tag to compare

What's Changed

XQuant Extension Tool (experimental) (#1502)

  • Calculates the error for each layer by comparing the float model and quantized model, using both models along with the quantization log. The results are presented in reports. It identifies the causes of the detected errors and recommends appropriate improvement measures for each cause. Read more Troubleshoot Manual

New CustomLayer: MulticlassNMSOBB (for PyTorch) (#1508)

  • Implemented a new CustomLayer: MulticlassNMSOBB, the Non-Maximum Suppression layer for the Object Detection (Oriented Bounding Box) task.

Bug Fixes

  • Modify the Stack Layer settings in TPC (v1.0) (#1523)

Tutorials

  • Add PyTorch Tutorial of PoseNet: We've added a new PyTorch tutorial of PoseNet model for the Human Pose Estimation task. Try it! (#1524)

Supported Python versions

  • Python(>=3.10).(#1551 )

Release 2.4.5

01 Dec 00:15
eb312d3

Choose a tag to compare

What's Changed

Bug Fixes

  • Limit networkx version(<3.6)
    • Additional restrictions have been added to the MCT requirements due to the lack of support for certain functions in networkx 3.6 (Released in 2025/11/24).

Release 2.4.4

14 Oct 04:55
c51661e

Choose a tag to compare

What's Changed

Bug Fixes

  • Limit pydantic version(<2.12.0)
    • Additional restrictions have been added to the MCT requirements due to the lack of support for certain functions in pydantic 2.12.0 (Released in 2025/10/08).

Release 2.4.3

02 Oct 03:09
48dbbec

Choose a tag to compare

What's Changed

Bug Fixes

  • Fixed python_requires>=3.9 from >=3.6 in setup.py

Release 2.4.2

02 Oct 01:30

Choose a tag to compare

What's Changed

API changes

  • Introduced the output_names argument in pytorch_export_model . This optional parameter specifies a list of output node names for export compatibility and is applicable only when using PytorchExportSerializationFormat.ONNX.
  • Usage example: To set the model output names, use the output_names argument:
mct.exporter.pytorch_export_model(
    model=quantized_exportable_model,
    save_model_path=onnx_file_path,
    repr_dataset=representative_data_gen,
    output_names=['model_output'])
  • Now the output name (i.e., the name of the output layer in the exported ONNX model) will be model_output, instead of the default name output, which would be assigned if output_names is not specified.

Black-Duck Reports 2.4.2

02 Jul 13:00

Choose a tag to compare

Pre-release
bd_2.4.2

Generating docs

Release 2.4.1

23 Jun 10:37

Choose a tag to compare

What's Changed

Improved activation memory estimation for Mixed Precision

We’ve refined how MCT estimates activation memory to ensure more accurate resource planning for quantized models, critical for edge deployment.

  • Quantization-Preserving and Fused Operators: Quantization-preserving and fused operators are now included in activation memory estimation, thereby addressing potential memory underestimation.
    • Note: Quantization preserving for a layer with multiple inputs is disabled.
  • Shift Negative Activation Correction Fix: An issue when SNC was used during mixed-precision quantization for activations has been fixed. During SNC, a 16-bit quantization layer is used. Thus, it was mistakenly considered in the activation memory estimation for the mixed-precision solution, even though it is part of a fused operation.
  • Finding activation cuts for Max Cut computation became deterministic for consistent results.

Improved layers' sensitivity evaluation for Mixed Precision

  • Isolated Bitwidth Testing: When evaluating a specific bitwidth candidate, other layers now stay in float precision (previously set to maximal bitwidth). This isolates the impact of the tested layer, providing clearer insights into its effect on model accuracy.
  • Sensitivity Normalization: The new metric_normalization parameter in MixedPrecisionQuantizationConfig lets you optionally normalize sensitivity metrics by either the maximal or minimal bitwidth candidate of the layer. The default behaviour remains as before (non-normalized).
  • New Exponential Weighting Method: A new weighting method MpDistanceWeighting.EXP was added, based on the exponent of negative distances between the quantized and the float models, with exp_distance_weighting_sigma parameter in MixedPrecisionQuantizationConfig controlling the normalization of the distances prior to applying the exponent.
  • Custom Sensitivity Metrics: A new custom_metric_fn parameter in MixedPrecisionQuantizationConfig allows you to define your own sensitivity metric function. It takes a model (configured with a candidate bitwidth) and returns a float scalar.

New Version of TPC Schema - v2

We’ve upgraded the Target Platform Capabilities (TPC) schema to v2, expanding support for new operations and model configurations.

  • New OperatorSetNames: EXP, COS, SIN: These additions enable quantization of models with exponential, cosine, and sine operations.
  • Quantization-Preserving Layers: A new boolean flag, insert_preserving_quantizers in TargetPlatformCapabilities, lets you add quantization-preserving activation holder layers to the final quantized model. Note that this is supported only in PyTorch.

Introduced Activation Threshold Search Using Hessian-Weighted MSE (HMSE)

  • A new method leverages a weighted activation histogram, with weights derived from Hessian values, to improve activation threshold search. This can be configured via QuantizationErrorMethod.

Weights Quantization Configuration Updates

We’ve added flexibility to weights quantization to give you more control over compression.

  • Positional Weights Quantization**: Add support for positional weights quantization config where weights are used in functional layers.
  • Manual Bitwidth Option**: You can now specify manual bitwidth for weights, overriding automatic settings for precise control.

Support for edge-mdt-cl Custom Layers

  • Added integration with the edge-mdt-cl package, enabling custom layers optimized for edge deployment. Check the edge-mdt-cl repo for layer documentation.

Extending Supported Versions

  • Added support for PyTorch 2.6 and for NumPy 2.

Breaking changes

  • Support Discontinued for Old Frameworks Versions: Support discontinued for TensorFlow 2.12, 2.13 and PyTorch 2.2.
  • Layers from sony-custom-layers No Longer Supported:
    The sony-custom-layers package is deprecated and replaced by edge-mdt-cl. Update your code to use edge-mdt-cl layers. Refer to the new package’s documentation for more information.

Additional Changes and Bug Fixes

  • Improved Error Reporting:
    We’ve enhanced error messages to help you diagnose and resolve issues faster, especially in complex PyTorch models.
    • Disconnected Input Nodes: Better detection and clearer reporting for PyTorch models with disconnected input nodes, which often occur when the forward pass includes optional or unused inputs (#1360).
    • PyTorch .to Misuse: Improved error messages for incorrect use of the .to method in PyTorch, providing more context to simplify debugging (#1382).
  • Fix for Reused Nodes: Addressed a bug where reused nodes were incorrectly added before original nodes, which could disrupt model structure (#1418).
  • ONNX Export Enhancements:
    • Weights Sharing Support: Added support for weights sharing in exported models, reducing model size and eliminating redundancy for more efficient storage and inference (#1402).
    • ONNX Opset 20: Now uses ONNX opset 20 for PyTorch versions > 2.4 for better compatibility with ONNX tools and runtimes.
    • Custom Output Names: Enabled the ability to specify output names, giving you more control for integrating exported models with other pipelines.
    • Positional Weights Quantization in ONNX Fake-Quant Mode: We’ve fixed an issue that prevented the export of ONNX models in fake-quantized mode when the model included positional weights in functional layers.
    • Multi-Input Fix: Fixed an export issue for fake-quantized models with multiple inputs.
  • Dynamic Output Size Support: Added support for dynamic output sizes in nn.ConvTranspose2d (#1381).
  • Debug Option to Bypass MCT Facade: Introduced a debug mode to bypass the MCT facade, allowing you to quickly determine whether an issue originates from your model or from MCT itself (#1410).
  • Upgrade to Pydantic 2: Upgraded to Pydantic 2 for improved data validation (#1426).

Tutorials

  • Add PyTorch Tutorial for Activation Z-Score Threshold: We’ve added a new PyTorch tutorial to guide you through using activation z-score thresholds for the quantization of PyTorch models. Try it on Google Colab!

Black-Duck Reports 2.4.1

18 Jun 15:58

Choose a tag to compare

Pre-release
bd_2.4.1

Update version to 2.4.1

Black-Duck Reports 2.4.0

22 Jan 10:17
a6593bd

Choose a tag to compare

Pre-release
bd_2.4.0

Fix bug in export using FQ ONNX in replacing activation holder with m…

Release 2.3.0

12 Feb 11:31
33c45ff

Choose a tag to compare

What's Changed

Major Changes

Target Platform Capabilities (TPC) Changes

TPC Schema

  • Introduced a new Schema (version v1) mechanism to establish the language for building a target platform capabilities description.
    • The schema defines the TargetPlatformCapabilites class, which can be built to describe the platform capabilities.
    • The OperatorSetNames enum provides a closed set of operator set names that allows to set quantization configuration options for commonly used operators.
    • Using a custom operator set name is also available.
    • All schema classes are using pydantic BaseModel for enhanced validation and schema flexibility.
      • MCT has a new dependency in "pydantic < 2.0".
  • In addition, a new versioning system was introduced, using minor and patch versions.

Naming Refactor

  • Creating the schema mechanism was followed by some classes renaming:
    • TargetPlatformModelTargetPlatformCapabilities
    • TargetPlatformCapabilitiesFrameworkQuantizationCapabilities
    • OperatorSetConcatOperatorSetGroup

Attach TPC to Framework

  • A new module named AttachTpcToFramework handles the conversion from a framework-independent TargetPlatformCapabilities description to a framework-specific FrameworkQuantizationCapabilities that maps each framework's operator to its possible quantization configurations.
  • Available for Tensorflow and PyTorch via AttachTpcToKeras and AttachTpcToPytorch, respectively.

API changes

  • All MCT's APIs are expecting to get a target_platform_capabilities object ( TargetPlatformCapabilities), which contains the framework-independent platform capabilities description.
  • This is changed from the previous behaviour which expected an initialized framework-specific object.
  • Note: the default behavior of MCT's APIs is not changed! calling an API function without passing a TPC object or passing an object obtained using the following API: get_target_platform_capabilities(<FW_NAME>, DEFAULT_TP_MODEL) would use the same default TPC as in previous release.
    • Regardless, users that accessed TPC-related classes not via the published API may encounter breaking changes due to class renaming and files hierarchy changes.

Tighter activation memory estimation via Max-Cut(Experimental)

  • Replace Max-Tensor with Max-Cut as the activation memory estimation method in the mixed precision algorithm.
  • The Max-Cut metric considers the model operator's execution schedule for a more precise estimation of activation memory (#1295)
  • Note: this is an estimation of the actual memory usage during runtime, the actual memory in runtime may differ.
  • 16-bit Activation Quantization (experimental)
    • The new activation memory estimation allows flexible usage of the mixed precision algorithm to enable 16-bit activation quantization (dependent on a TPC that supports 16-bit quantization for different operators).
    • 16-bit quantization can be enabled either via Manual Bit-width selection API or automatically, by executing mixed precision with a proper activation or total memory constraint.
    • Note that when running mixed precision with activation memory constraint to enable 16-bit allocation, shift negative correction should be disabled.

Improved GPTQ algorithm via Sample Layer Attention (SLA):

  • Enabled SLA by default in both Keras and PyTorch (#1287, #1260)
  • Added gradual activation quantization support for enhanced results when quantizing activations (#1244, #1237)
  • Implemented Rademacher distribution for Hessian estimation (#1250)
  • For more details, please visit our paper.

Resource Utilization (RU) calculation:

  • Use max cut activation method for activation and total resource utilization computation.
  • Compute the total target from weights and activations utilization instead of using it as a separate metric.
  • Weights memory computation now includes all quantized weights in the model, instead of considering only kernel attributes. This may change the results of existing execution of mixed precision scenarios.
  • Note that the ResourceUtilization API did not change.

Minor Changes

  • Added Activation Bias Correction feature to potentially enhance quantization results of vision transformers (#1256)
  • Added substitution to decompose MatMul operation into baseline components in PyTorch (#1313)
  • Added substitution decompose scaled dot product attention operator in PyTorch (#1229)
  • Converted core configuration classes to dataclasses for simpler usage and strict behavior verification (CoreConfig, QuantizationConfig, etc.) (#1203)
  • Trainable Infrastructure changes:
    • Moved STE/LSQ activation quantizers from QAT to trainable infrastructure.
    • Renamed Trainable QAT quantizer to Weight Trainable quantizer (#1240)
  • Added support for PyTorch 2.4, PyTorch 2.5, and Python 3.12

Bug Fixes

  • Fix activation gradient backpropagating in GPTQ for PyTorch models. It now uses STE Activation Trainable quantizers with frozen quantization parameters instead of Activation Inferable quantizers, which did not propagate gradients. (#1197)
  • Fix ONNX export when PyTorch models have multiple inputs/outputs (#1223)
  • Fixed the issue of duplicating reused layers in PyTorch models (#1217)
  • Fixed HMSE being overridden by MSE after resource utilization computation (#1253)
  • Resolved duplicate QCOs error handling (#1282, #1149)
  • Fixed tf.nn.{conv2d,convolution} substitution to handle attributes with default values that were not passed explicitly (#1275)
  • Fixed handling errors in PyTorch graphs by managing nodes with missing outputs and ensuring robust extraction of output shapes (#1186)

New Contributors

Welcome @ambitious-octopus and @itai-berman for their first contributions! #1186 , #1266