You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<tr><td><code>axis</code></td><td>::mlir::IntegerAttr</td><td>64-bit signed integer attribute</td></tr>
2205
+
<tr><td><code>block_size</code></td><td>::mlir::IntegerAttr</td><td>64-bit signed integer attribute</td></tr>
2202
2206
</table>
2203
2207
2204
2208
#### Operands:
2205
2209
2206
2210
| Operand | Description |
2207
2211
| :-----: | ----------- |
2208
-
| `x` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit unsigned integer values or tensor of 32-bit signless integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values
2212
+
| `x` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit signless integer values or tensor of 16-bit unsigned integer values or tensor of 32-bit signless integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or tensor of 4-bit unsigned integer values or tensor of 4-bit signless integer values
2209
2213
| `x_scale` | tensor of 32-bit float values or tensor of 16-bit float values or tensor of bfloat16 type values
2210
-
| `x_zero_point` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit unsigned integer values or tensor of 32-bit signless integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or none type
2214
+
| `x_zero_point` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit signless integer values or tensor of 16-bit unsigned integer values or tensor of 32-bit signless integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or tensor of 4-bit unsigned integer values or tensor of 4-bit signless integer values or none type
The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor.
6814
-
The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
6815
-
The quantization formula is `y = saturate ((x / y_scale) + y_zero_point)`.
6816
-
For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.
6817
-
For (x / y_scale), it's rounding to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
6818
-
'y_zero_point' and 'y' must have same type.
6819
-
'y_zero_point' is usually not used for quantization to float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz,
6820
-
but the quantization formula remains the same for consistency and
6821
-
the type of the attribute 'y_zero_point' still determines the quantization type.
6817
+
The linear quantization operator consumes a high-precision tensor, a scale, and a zero point to compute the
6818
+
low-precision/quantized tensor. The scale factor and zero point must have the same shape, determining the quantization
6819
+
granularity. The quantization formula is `y = saturate((x / y_scale) + y_zero_point)`.
| `x` | tensor of 32-bit float values or tensor of 16-bit float values or tensor of bfloat16 type values or tensor of 32-bit signless integer values
6842
6864
| `y_scale` | tensor of 32-bit float values or tensor of 16-bit float values or tensor of bfloat16 type values or tensor of 32-bit signless integer values
6843
-
| `y_zero_point` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit unsigned integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or none type
6865
+
| `y_zero_point` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit signless integer values or tensor of 16-bit unsigned integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or tensor of 4-bit unsigned integer values or tensor of 4-bit signless integer values or none type
6844
6866
6845
6867
#### Results:
6846
6868
6847
6869
| Result | Description |
6848
6870
| :----: | ----------- |
6849
-
| `y` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit unsigned integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values
6871
+
| `y` | tensor of 8-bit signless integer values or tensor of 8-bit unsigned integer values or tensor of 16-bit signless integer values or tensor of 16-bit unsigned integer values or tensor of f8E4M3FN type values or tensor of f8E4M3FNUZ type values or tensor of f8E5M2 type values or tensor of f8E5M2FNUZ type values or tensor of 4-bit unsigned integer values or tensor of 4-bit signless integer values
The linear quantization operator. It consumes a high precision tensor, a scale, and a zero point to compute the low precision / quantized tensor.
6131
-
The scale factor and zero point must have same shape, and can be either a scalar for per-tensor / per layer quantization, or a 1-D tensor for per-axis quantization.
6132
-
The quantization formula is `y = saturate ((x / y_scale) + y_zero_point)`.
6133
-
For saturation, it saturates to [0, 255] if it's uint8, or [-128, 127] if it's int8.
6134
-
For (x / y_scale), it's rounding to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
6135
-
'y_zero_point' and 'y' must have same type.
6136
-
'y_zero_point' is usually not used for quantization to float8e4m3fn, float8e4m3fnuz, float8e5m2, float8e5m2fnuz,
6137
-
but the quantization formula remains the same for consistency and
6138
-
the type of the attribute 'y_zero_point' still determines the quantization type.
6139
-
}];
6140
-
// AMD: Manual addition of uint16
6133
+
The linear quantization operator consumes a high-precision tensor, a scale, and a zero point to compute the
6134
+
low-precision/quantized tensor. The scale factor and zero point must have the same shape, determining the quantization
6135
+
granularity. The quantization formula is `y = saturate((x / y_scale) + y_zero_point)`.
6136
+
6137
+
Saturation is done according to:
6138
+
- uint16: [0, 65535]
6139
+
- int16: [-32768, 32767]
6140
+
- uint8: [0, 255]
6141
+
- int8: [-128, 127]
6142
+
- uint4: [0, 15]
6143
+
- int4: [-8, 7]
6144
+
6145
+
For `(x / y_scale)`, it rounds to the nearest even. Refer to https://en.wikipedia.org/wiki/Rounding for details.
6146
+
6147
+
`y_zero_point` and `y` must have the same type. `y_zero_point` is usually not used for quantization to float8 types, but the quantization
6148
+
formula remains the same for consistency, and the type of the attribute `y_zero_point` still determines the quantization type.
6149
+
6150
+
There are three supported quantization granularities, determined by the shape of `y_scale`.
6151
+
In all cases, `y_zero_point` must have the same shape as `y_scale`.
6152
+
- Per-tensor (per-layer) quantization: `y_scale` is a scalar.
6153
+
- Per-axis quantization: The scale must be a 1-D tensor, with the length of the quantization axis. For an input shape
6154
+
`(D0, ..., Di, ..., Dn)` and `axis=i`, `y_scale` is a 1-D tensor of length `Di`.
6155
+
- Blocked quantization: The scale's shape is identical to the input's shape, except for one dimension, in which
6156
+
blocking is performed. Given `x` shape `(D0, ..., Di, ..., Dn)`, `axis=i`, and block size `B`: `y_scale` shape is
6157
+
`(D0, ..., ceil(Di/B), ..., Dn)`.
6158
+
}];
6141
6159
let arguments = (ins AnyTypeOf<[TensorOf<[F32]>, TensorOf<[F16]>, TensorOf<[BF16]>, TensorOf<[I32]>]>:$x,
0 commit comments