Xilinx
diff --git a/‎docs/Dialects/onnx.md
Lines changed: 44 additions & 22 deletions b/‎docs/Dialects/onnx.md
Lines changed: 44 additions & 22 deletions
diff --git a/‎src/Builder/OpBuildTable.inc
Lines changed: 2 additions & 2 deletions b/‎src/Builder/OpBuildTable.inc
Lines changed: 2 additions & 2 deletions
diff --git a/‎src/Conversion/ONNXToTOSA/NN/DequantizeLinear.cpp
Lines changed: 7 additions & 3 deletions b/‎src/Conversion/ONNXToTOSA/NN/DequantizeLinear.cpp
Lines changed: 7 additions & 3 deletions
diff --git a/‎src/Conversion/ONNXToTOSA/NN/QuantizeLinear.cpp
Lines changed: 14 additions & 3 deletions b/‎src/Conversion/ONNXToTOSA/NN/QuantizeLinear.cpp
Lines changed: 14 additions & 3 deletions
diff --git a/‎src/Dialect/ONNX/ONNXOps.td.inc
Lines changed: 46 additions & 26 deletions b/‎src/Dialect/ONNX/ONNXOps.td.inc
Lines changed: 46 additions & 26 deletions
@@ -2180,15 +2180,18 @@ Effects: `MemoryEffects::Effect{}`
 
 _ONNX DequantizeLinear operation_
 
-The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full precision tensor.
-The dequantization formula is `y = (x - x_zero_point) * x_scale`. `x_scale` and `x_zero_point` must have same shape, and can be either a scalar
-for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
-`x_zero_point` and `x` must have same type. `x` and `y` must have same shape. In the case of dequantizing int32,
-there's no zero point (zero point is supposed to be 0).
-`zero-point` is usually not used in the case of float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz quantization,
-but the dequantization formula remains the same for consistency and 'x_scale' still determines the output type.
+The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the
+full-precision tensor. The dequantization formula is `y = (x - x_zero_point) * x_scale`. `x_scale` and `x_zero_point`
+must have the same shape, determining the quantization's granularity: a scalar for per-tensor/per-layer quantization,
+a 1-D tensor for per-axis quantization, or have a rank identical to the input for blocked quantization.
+See QuantizeLinear for details on quantization granularity.
 
-Traits: `AlwaysSpeculatableImplTrait`, `OpVersionTrait<19>`
+`x_zero_point` and `x` must have the same type. `x` and `y` must have the same shape. In the case of dequantizing
+`int32`, there's no zero point (zero point is supposed to be 0).
+`zero-point` is usually not used in the case of float8 types quantization, but the dequantization formula remains the same
+for consistency, and `x_scale` still determines the output type.
+
+Traits: `AlwaysSpeculatableImplTrait`, `OpVersionTrait<21>`
 
 Interfaces: `ConditionallySpeculatable`, `NoMemoryEffect (MemoryEffectOpInterface)`, `ShapeHelperOpInterface`, `ShapeInferenceOpInterface`
 
@@ -2199,15 +2202,16 @@ Effects: `MemoryEffects::Effect{}`
 <table>
 <tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
 <tr><td><code>axis</code></td><td>::mlir::IntegerAttr</td><td>64-bit signed integer attribute</td></tr>
+<tr><td><code>block_size</code></td><td>::mlir::IntegerAttr</td><td>64-bit signed integer attribute</td></tr>
 </table>
 
 #### Operands:
 
 | Operand | Description |
 | :-----: | ----------- |
-| `x` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit unsigned integer values or tensor of 32-bit signless integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values
+| `x` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit signless integer values or tensor of 16-bit unsigned integer values or tensor of 32-bit signless integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or tensor of 4-bit unsigned integer values or tensor of 4-bit signless integer values
 | `x_scale` | tensor of 32-bit float values or tensor of 16-bit float values or tensor of bfloat16 type values
-| `x_zero_point` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit unsigned integer values or tensor of 32-bit signless integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or none type
+| `x_zero_point` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit signless integer values or tensor of 16-bit unsigned integer values or tensor of 32-bit signless integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or tensor of 4-bit unsigned integer values or tensor of 4-bit signless integer values or none type
 
 #### Results:
 
@@ -6810,17 +6814,33 @@ Effects: `MemoryEffects::Effect{}`
 
 _ONNX QuantizeLinear operation_
 
-The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor.
-The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
-The quantization formula is `y = saturate ((x / y_scale) + y_zero_point)`.
-For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.
-For (x / y_scale), it's rounding to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
-'y_zero_point' and 'y' must have same type.
-'y_zero_point' is usually not used for quantization to float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz,
-but the quantization formula remains the same for consistency and
-the type of the attribute 'y_zero_point' still determines the quantization type.
+The linear quantization operator consumes a high-precision tensor, a scale, and a zero point to compute the
+low-precision/quantized tensor. The scale factor and zero point must have the same shape, determining the quantization
+granularity. The quantization formula is `y = saturate((x / y_scale) + y_zero_point)`.
 
-Traits: `AlwaysSpeculatableImplTrait`, `OpVersionTrait<19>`
+Saturation is done according to:
+- uint16: [0, 65535]
+- int16: [-32768, 32767]
+- uint8: [0, 255]
+- int8: [-128, 127]
+- uint4: [0, 15]
+- int4: [-8, 7]
+
+For `(x / y_scale)`, it rounds to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
+
+`y_zero_point` and `y` must have the same type. `y_zero_point` is usually not used for quantization to float8 types, but the quantization
+formula remains the same for consistency, and the type of the attribute `y_zero_point` still determines the quantization type.
+
+There are three supported quantization granularities, determined by the shape of `y_scale`.
+In all cases, `y_zero_point` must have the same shape as `y_scale`.
+- Per-tensor (per-layer) quantization: `y_scale` is a scalar.
+- Per-axis quantization: The scale must be a 1-D tensor, with the length of the quantization axis. For an input shape
+ `(D0, ..., Di, ..., Dn)` and `axis=i`, `y_scale` is a 1-D tensor of length `Di`.
+- Blocked quantization: The scale's shape is identical to the input's shape, except for one dimension, in which
+  blocking is performed. Given `x` shape `(D0, ..., Di, ..., Dn)`, `axis=i`, and block size `B`: `y_scale` shape is
+  `(D0, ..., ceil(Di/B), ..., Dn)`.
+
+Traits: `AlwaysSpeculatableImplTrait`, `OpVersionTrait<21>`
 
 Interfaces: `ConditionallySpeculatable`, `NoMemoryEffect (MemoryEffectOpInterface)`, `ShapeHelperOpInterface`, `ShapeInferenceOpInterface`
 
@@ -6831,6 +6851,8 @@ Effects: `MemoryEffects::Effect{}`
 <table>
 <tr><th>Attribute</th><th>MLIR Type</th><th>Description</th></tr>
 <tr><td><code>axis</code></td><td>::mlir::IntegerAttr</td><td>64-bit signed integer attribute</td></tr>
+<tr><td><code>block_size</code></td><td>::mlir::IntegerAttr</td><td>64-bit signed integer attribute</td></tr>
+<tr><td><code>output_dtype</code></td><td>::mlir::IntegerAttr</td><td>64-bit signed integer attribute</td></tr>
 <tr><td><code>saturate</code></td><td>::mlir::IntegerAttr</td><td>64-bit signed integer attribute</td></tr>
 </table>
 
@@ -6840,13 +6862,13 @@ Effects: `MemoryEffects::Effect{}`
 | :-----: | ----------- |
 | `x` | tensor of 32-bit float values or tensor of 16-bit float values or tensor of bfloat16 type values or tensor of 32-bit signless integer values
 | `y_scale` | tensor of 32-bit float values or tensor of 16-bit float values or tensor of bfloat16 type values or tensor of 32-bit signless integer values
-| `y_zero_point` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit unsigned integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or none type
+| `y_zero_point` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit signless integer values or tensor of 16-bit unsigned integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or tensor of 4-bit unsigned integer values or tensor of 4-bit signless integer values or none type
 
 #### Results:
 
 | Result | Description |
 | :----: | ----------- |
-| `y` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit unsigned integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values
+| `y` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit signless integer values or tensor of 16-bit unsigned integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or tensor of 4-bit unsigned integer values or tensor of 4-bit signless integer values
 
 ### `onnx.RMSLayerNormalization` (ONNXRMSLayerNormalizationOp)
 
 
@@ -50,7 +50,7 @@ op_dialect_version_map_["Col2Im"] = {18};
 op_dialect_version_map_["CumSum"] = {14};
 op_dialect_version_map_["DeformConv"] = {22};
 op_dialect_version_map_["DepthToSpace"] = {13};
-op_dialect_version_map_["DequantizeLinear"] = {19};
+op_dialect_version_map_["DequantizeLinear"] = {21};
 op_dialect_version_map_["Det"] = {22};
 op_dialect_version_map_["DFT"] = {20, 17};
 op_dialect_version_map_["DictVectorizer"] = {1};
@@ -138,7 +138,7 @@ op_dialect_version_map_["Pad"] = {21, 18, 13, 11, 2};
 op_dialect_version_map_["Pow"] = {15};
 op_dialect_version_map_["QLinearConv"] = {10};
 op_dialect_version_map_["QLinearMatMul"] = {10};
-op_dialect_version_map_["QuantizeLinear"] = {19};
+op_dialect_version_map_["QuantizeLinear"] = {21};
 op_dialect_version_map_["RNN"] = {22};
 op_dialect_version_map_["RandomNormal"] = {22};
 op_dialect_version_map_["RandomNormalLike"] = {22};
 
@@ -54,13 +54,17 @@ class ONNXDequantizeLinearOpLoweringToTOSA
       return rewriter.notifyMatchFailure(
           loc, "expected zero point to be none or have tensor type");
     }
-
-    if (auto scaleTy = cast<ShapedType>(adaptor.getXScale().getType());
-        !scaleTy.hasStaticShape()) {
+    const auto scaleTy = cast<ShapedType>(adaptor.getXScale().getType());
+    if (!scaleTy.hasStaticShape()) {
       return rewriter.notifyMatchFailure(
           loc, "expected scale to have static shape");
     }
 
+    if (scaleTy.getRank() > 1) {
+      return rewriter.notifyMatchFailure(
+          loc, "block quantization is not yet supported");
+    }
+
     int64_t axis = op.getAxis();
     // See https://github.com/onnx/onnx/issues/6067
     if (axis == 1 && (resultType.getRank() == 1 || resultType.getRank() == 0))
 
@@ -47,13 +47,24 @@ class ONNXQuantizeLinearOpLoweringToTOSA
       return rewriter.notifyMatchFailure(
           loc, "expected zero point to have static shape");
     }
-
-    if (auto zpTy = dyn_cast<ShapedType>(adaptor.getYScale().getType());
-        zpTy && !zpTy.hasStaticShape()) {
+    auto scaleTy = dyn_cast<ShapedType>(adaptor.getYScale().getType());
+    if (scaleTy && !scaleTy.hasStaticShape()) {
       return rewriter.notifyMatchFailure(
           loc, "expected scale to have static shape");
     }
 
+    if (scaleTy.getRank() > 1) {
+      return rewriter.notifyMatchFailure(
+          loc, "block quantization is not yet supported");
+    }
+
+    if (const auto outputDtype =
+            static_cast<onnx::TensorProto_DataType>(op.getOutputDtype());
+        outputDtype != onnx::TensorProto_DataType_UNDEFINED) {
+      return rewriter.notifyMatchFailure(
+          loc, "custom output dtype not yet supported");
+    }
+
     if (!op.getSaturate()) {
       return rewriter.notifyMatchFailure(loc, "Only saturate=1 is supported");
     }
 
@@ -1874,23 +1874,26 @@ def ONNXDepthToSpaceOp:ONNX_Op<"DepthToSpace",
 }
 
 def ONNXDequantizeLinearOp:ONNX_Op<"DequantizeLinear",
-  [Pure, OpVersionTrait<19>, DeclareOpInterfaceMethods<ShapeInferenceOpInterface>, DeclareOpInterfaceMethods<ShapeHelperOpInterface>]> {
+  [Pure, OpVersionTrait<21>, DeclareOpInterfaceMethods<ShapeInferenceOpInterface>, DeclareOpInterfaceMethods<ShapeHelperOpInterface>]> {
   let hasCanonicalizer = 1;
   let summary = "ONNX DequantizeLinear operation";
   let description = [{
-  The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the full precision tensor.
-  The dequantization formula is `y = (x - x_zero_point) * x_scale`. `x_scale` and `x_zero_point` must have same shape, and can be either a scalar
-  for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
-  `x_zero_point` and `x` must have same type. `x` and `y` must have same shape. In the case of dequantizing int32,
-  there's no zero point (zero point is supposed to be 0).
-  `zero-point` is usually not used in the case of float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz quantization,
-  but the dequantization formula remains the same for consistency and 'x_scale' still determines the output type.
+  The linear dequantization operator. It consumes a quantized tensor, a scale, and a zero point to compute the
+  full-precision tensor. The dequantization formula is `y = (x - x_zero_point) * x_scale`. `x_scale` and `x_zero_point`
+  must have the same shape, determining the quantization's granularity: a scalar for per-tensor/per-layer quantization,
+  a 1-D tensor for per-axis quantization, or have a rank identical to the input for blocked quantization.
+  See QuantizeLinear for details on quantization granularity.
+  
+  `x_zero_point` and `x` must have the same type. `x` and `y` must have the same shape. In the case of dequantizing
+  `int32`, there's no zero point (zero point is supposed to be 0).
+  `zero-point` is usually not used in the case of float8 types quantization, but the dequantization formula remains the same
+  for consistency, and `x_scale` still determines the output type.
   }];
-  // AMD: Manual addition of uint16
-  let arguments = (ins AnyTypeOf<[TensorOf<[I8]>, TensorOf<[UI8]>, TensorOf<[UI16]>, TensorOf<[I32]>, TensorOf<[F8E4M3FN]>, TensorOf<[F8E4M3FNUZ]>, TensorOf<[F8E5M2]>, TensorOf<[F8E5M2FNUZ]>]>:$x,
+  let arguments = (ins AnyTypeOf<[TensorOf<[I8]>, TensorOf<[UI8]>, TensorOf<[I16]>, TensorOf<[UI16]>, TensorOf<[I32]>, TensorOf<[F8E4M3FN]>, TensorOf<[F8E4M3FNUZ]>, TensorOf<[F8E5M2]>, TensorOf<[F8E5M2FNUZ]>, TensorOf<[UI<4>]>, TensorOf<[I<4>]>]>:$x,
     AnyTypeOf<[TensorOf<[F32]>, TensorOf<[F16]>, TensorOf<[BF16]>]>:$x_scale,
-    AnyTypeOf<[TensorOf<[I8]>, TensorOf<[UI8]>, TensorOf<[UI16]>, TensorOf<[I32]>, TensorOf<[F8E4M3FN]>, TensorOf<[F8E4M3FNUZ]>, TensorOf<[F8E5M2]>, TensorOf<[F8E5M2FNUZ]>, NoneType]>:$x_zero_point,
-    DefaultValuedAttr<SI64Attr, "1">:$axis);
+    AnyTypeOf<[TensorOf<[I8]>, TensorOf<[UI8]>, TensorOf<[I16]>, TensorOf<[UI16]>, TensorOf<[I32]>, TensorOf<[F8E4M3FN]>, TensorOf<[F8E4M3FNUZ]>, TensorOf<[F8E5M2]>, TensorOf<[F8E5M2FNUZ]>, TensorOf<[UI<4>]>, TensorOf<[I<4>]>, NoneType]>:$x_zero_point,
+    DefaultValuedAttr<SI64Attr, "1">:$axis,
+    DefaultValuedAttr<SI64Attr, "0">:$block_size);
   let results = (outs AnyTypeOf<[TensorOf<[F32]>, TensorOf<[F16]>, TensorOf<[BF16]>]>:$y);
   let extraClassDeclaration = [{
     static int getNumberOfOperands() {
@@ -6124,26 +6127,43 @@ def ONNXQLinearMatMulOp:ONNX_Op<"QLinearMatMul",
 }
 
 def ONNXQuantizeLinearOp:ONNX_Op<"QuantizeLinear",
-  [Pure, OpVersionTrait<19>, DeclareOpInterfaceMethods<ShapeInferenceOpInterface>, DeclareOpInterfaceMethods<ShapeHelperOpInterface>]> {
+  [Pure, OpVersionTrait<21>, DeclareOpInterfaceMethods<ShapeInferenceOpInterface>, DeclareOpInterfaceMethods<ShapeHelperOpInterface>]> {
   let summary = "ONNX QuantizeLinear operation";
   let description = [{
-  The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor.
-  The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
-  The quantization formula is `y = saturate ((x / y_scale) + y_zero_point)`.
-  For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.
-  For (x / y_scale), it's rounding to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
-  'y_zero_point' and 'y' must have same type.
-  'y_zero_point' is usually not used for quantization to float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz,
-  but the quantization formula remains the same for consistency and
-  the type of the attribute 'y_zero_point' still determines the quantization type.
-  }];
-  // AMD: Manual addition of uint16
+  The linear quantization operator consumes a high-precision tensor, a scale, and a zero point to compute the
+  low-precision/quantized tensor. The scale factor and zero point must have the same shape, determining the quantization
+  granularity. The quantization formula is `y = saturate((x / y_scale) + y_zero_point)`.
+  
+  Saturation is done according to:
+  - uint16: [0, 65535]
+  - int16: [-32768, 32767]
+  - uint8: [0, 255]
+  - int8: [-128, 127]
+  - uint4: [0, 15]
+  - int4: [-8, 7]
+  
+  For `(x / y_scale)`, it rounds to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
+  
+  `y_zero_point` and `y` must have the same type. `y_zero_point` is usually not used for quantization to float8 types, but the quantization
+  formula remains the same for consistency, and the type of the attribute `y_zero_point` still determines the quantization type.
+  
+  There are three supported quantization granularities, determined by the shape of `y_scale`.
+  In all cases, `y_zero_point` must have the same shape as `y_scale`.
+  - Per-tensor (per-layer) quantization: `y_scale` is a scalar.
+  - Per-axis quantization: The scale must be a 1-D tensor, with the length of the quantization axis. For an input shape
+   `(D0, ..., Di, ..., Dn)` and `axis=i`, `y_scale` is a 1-D tensor of length `Di`.
+  - Blocked quantization: The scale's shape is identical to the input's shape, except for one dimension, in which
+    blocking is performed. Given `x` shape `(D0, ..., Di, ..., Dn)`, `axis=i`, and block size `B`: `y_scale` shape is
+    `(D0, ..., ceil(Di/B), ..., Dn)`.
+  }];
   let arguments = (ins AnyTypeOf<[TensorOf<[F32]>, TensorOf<[F16]>, TensorOf<[BF16]>, TensorOf<[I32]>]>:$x,
     AnyTypeOf<[TensorOf<[F32]>, TensorOf<[F16]>, TensorOf<[BF16]>, TensorOf<[I32]>]>:$y_scale,
-    AnyTypeOf<[TensorOf<[I8]>, TensorOf<[UI8]>, TensorOf<[UI16]>, TensorOf<[F8E4M3FN]>, TensorOf<[F8E4M3FNUZ]>, TensorOf<[F8E5M2]>, TensorOf<[F8E5M2FNUZ]>, NoneType]>:$y_zero_point,
+    AnyTypeOf<[TensorOf<[I8]>, TensorOf<[UI8]>, TensorOf<[I16]>, TensorOf<[UI16]>, TensorOf<[F8E4M3FN]>, TensorOf<[F8E4M3FNUZ]>, TensorOf<[F8E5M2]>, TensorOf<[F8E5M2FNUZ]>, TensorOf<[UI<4>]>, TensorOf<[I<4>]>, NoneType]>:$y_zero_point,
     DefaultValuedAttr<SI64Attr, "1">:$axis,
+    DefaultValuedAttr<SI64Attr, "0">:$block_size,
+    DefaultValuedAttr<SI64Attr, "0">:$output_dtype,
     DefaultValuedAttr<SI64Attr, "1">:$saturate);
-  let results = (outs AnyTypeOf<[TensorOf<[I8]>, TensorOf<[UI8]>, TensorOf<[UI16]>, TensorOf<[F8E4M3FN]>, TensorOf<[F8E4M3FNUZ]>, TensorOf<[F8E5M2]>, TensorOf<[F8E5M2FNUZ]>]>:$y);
+  let results = (outs AnyTypeOf<[TensorOf<[I8]>, TensorOf<[UI8]>, TensorOf<[I16]>, TensorOf<[UI16]>, TensorOf<[F8E4M3FN]>, TensorOf<[F8E4M3FNUZ]>, TensorOf<[F8E5M2]>, TensorOf<[F8E5M2FNUZ]>, TensorOf<[UI<4>]>, TensorOf<[I<4>]>]>:$y);
   let extraClassDeclaration = [{
     static int getNumberOfOperands() {
       return 3;