Releases · ROCm/hipBLASLt

Fused Clamp GEMM for HIPBLASLT_EPILOGUE_CLAMP_EXT and HIPBLASLT_EPILOGUE_CLAMP_BIAS_EXT. This feature requires the minimum (HIPBLASLT_MATMUL_DESC_EPILOGUE_ACT_ARG0_EXT) and maximum (HIPBLASLT_MATMUL_DESC_EPILOGUE_ACT_ARG1_EXT) to be set.
Support for ReLU/Clamp activation functions with auxiliary output for the f16 and bf16 data types for gfx942 to capture intermediate results. This feature is enabled for HIPBLASLT_EPILOGUE_RELU_AUX, HIPBLASLT_EPILOGUE_RELU_AUX_BIAS, HIPBLASLT_EPILOGUE_CLAMP_AUX_EXT, and HIPBLASLT_EPILOGUE_CLAMP_AUX_BIAS_EXT.
Support for HIPBLAS_COMPUTE_32F_FAST_16BF for FP32 data type for gfx950 only.
Added the CPP extension APIs setMaxWorkspaceBytes and getMaxWorkspaceBytes.
Added the ability to print logs (using HIPBLASLT_LOG_MASK=32) for Grouped GEMM.
Support for swizzleA by using the hipblaslt-ext cpp API.
Support for hipBLASLt extop for gfx11xx and gfx12xx.

Changed

hipblasLtMatmul() now returns an error when the workspace size is insufficient, rather than causing a segmentation fault.

Resolved issues

Fix incorrect results when using ldd and ldc with some solutions

Assets 2

10 Oct 12:12

rocm-ci

rocm-7.0.2

81ed29e

hipblaslt 1.0.0 for ROCm 7.0.2

hipBLASLt code for ROCm 7.0.2 did not change. The library was rebuilt for the updated ROCm 7.0.2 stack.

Assets 2

17 Sep 16:37

rocm-ci

rocm-7.0.1

e2e3528

hipblaslt 1.0.0 for ROCm 7.0.1

hipBLASLt code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.

Assets 2

16 Sep 06:31

rocm-ci

rocm-7.0.0

e2e3528

hipBLASLt 1.0.0 for ROCm 7.0.0

Added

Stream-K GEMM support has been enabled for the FP32, FP16, BF16, FP8, and BF8 data types on the MI300A APU. To activate this feature, set the TENSILE_SOLUTION_SELECTION_METHOD environment variable to 2, for example, export TENSILE_SOLUTION_SELECTION_METHOD=2.
Fused Swish/SiLU GEMM in hipBLASLt (enabled by HIPBLASLT_EPILOGUE_SWISH_EXT and HIPBLASLT_EPILOGUE_SWISH_BIAS_EXT)
Added support for HIPBLASLT_EPILOGUE_GELU_AUX_BIAS for gfx942
Added HIPBLASLT_TUNING_USER_MAX_WORKSPACE to constrain max workspace size for user offline tuning
Added HIPBLASLT_ORDER_COL16_4R16 and HIPBLASLT_ORDER_COL16_4R8 to hipblasLtOrder_t to support FP16/BF16 swizzle GEMM and FP8/BF8 swizzle GEMM respectively.
Added TF32 emulation on gfx950

Changed

HIPBLASLT_MATMUL_DESC_A_SCALE_POINTER_VEC_EXT and HIPBLASLT_MATMUL_DESC_B_SCALE_POINTER_VEC_EXT are removed. Use the HIPBLASLT_MATMUL_DESC_A_SCALE_MODE and HIPBLASLT_MATMUL_DESC_B_SCALE_MODE attributes to set scalar (HIPBLASLT_MATMUL_MATRIX_SCALE_SCALAR_32F) or vector (HIPBLASLT_MATMUL_MATRIX_SCALE_OUTER_VEC_32F).
The non-V2 APIs (GemmPreference, GemmProblemType, GemmEpilogue, GemmTuning, GemmInputs) in the Cpp header are now the same as the V2 APIs (GemmPreferenceV2, GemmProblemTypeV2, GemmEpilogueV2, GemmTuningV2, GemmInputsV2). The original non-V2 APIs are removed.
hipblasltExtAMaxWithScale API is removed.

Optimized

Improved performance for 8-bit (FP8/BF8/I8) NN/NT cases by adding s_delay_alu to reduce stalls from dependent ALU operations on gfx12+.
Improved performance for 8-bit and 16-bit (FP16/BF16) TN cases by enabling software dependency check (Expert Scheduling Mode) under certain restrictions to reduce redundant hardware dependency checks on gfx12+.
Improved performance for 8-bit, 16-bit, and 32-bit batched GEMM with a better heuristic search algorithm for gfx942.

Upcoming changes

V2 APIs (GemmPreferenceV2, GemmProblemTypeV2, GemmEpilogueV2, GemmTuningV2, GemmInputsV2) are deprecated.

Assets 2

24 Sep 14:01

rocm-ci

rocm-6.4.4

443a256

hipBLASLt 0.12.1 for ROCm 6.4.4

hipBLASLt code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.

Assets 2

07 Aug 14:20

rocm-ci

rocm-6.4.3

443a256

hipBLASLt 0.12.1 for ROCm 6.4.3

hipBLASLt code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.

Assets 2

21 Jul 16:54

rocm-ci

rocm-6.4.2

65cf318

hipBLASLt 0.12.1 for ROCm 6.4.2

Added

Support for gfx1151

Assets 2

20 May 13:15

rocm-ci

rocm-6.4.1

4d62e13

hipBLASLt 0.12.1 for ROCm 6.4.1

Resolved issues

Fixed an accuracy issue that occurred for some solutions using an FP32 or TF32 data type with a TT transpose.

Assets 2

Releases: ROCm/hipBLASLt

hipBLASLt 1.2.0 for ROCm 7.2.0

Added

Uh oh!

hipblaslt 1.1.0 for ROCm 7.1.1

Uh oh!

hipBLASLt 1.1.0 for ROCm 7.1.0

Added

Changed

Resolved issues

Uh oh!

hipblaslt 1.0.0 for ROCm 7.0.2

Uh oh!

hipblaslt 1.0.0 for ROCm 7.0.1

Uh oh!

hipBLASLt 1.0.0 for ROCm 7.0.0

Added

Changed

Optimized

Upcoming changes

Uh oh!

hipBLASLt 0.12.1 for ROCm 6.4.4

Uh oh!

hipBLASLt 0.12.1 for ROCm 6.4.3

Uh oh!

hipBLASLt 0.12.1 for ROCm 6.4.2

Added

Uh oh!

hipBLASLt 0.12.1 for ROCm 6.4.1

Resolved issues

Uh oh!