Skip to content

Releases: ROCm/hipBLASLt

hipBLASLt 1.2.0 for ROCm 7.2.0

21 Jan 18:58

Choose a tag to compare

Added

  • Support for the 'BF16' data type for gfx90a.

hipblaslt 1.1.0 for ROCm 7.1.1

26 Nov 18:46

Choose a tag to compare

hipBLASLt code for ROCm 7.1.1 did not change. The library was rebuilt for the updated ROCm 7.1.1 stack.

hipBLASLt 1.1.0 for ROCm 7.1.0

30 Oct 05:51

Choose a tag to compare

Added

  • Fused Clamp GEMM for HIPBLASLT_EPILOGUE_CLAMP_EXT and HIPBLASLT_EPILOGUE_CLAMP_BIAS_EXT. This feature requires the minimum (HIPBLASLT_MATMUL_DESC_EPILOGUE_ACT_ARG0_EXT) and maximum (HIPBLASLT_MATMUL_DESC_EPILOGUE_ACT_ARG1_EXT) to be set.
  • Support for ReLU/Clamp activation functions with auxiliary output for the f16 and bf16 data types for gfx942 to capture intermediate results. This feature is enabled for HIPBLASLT_EPILOGUE_RELU_AUX, HIPBLASLT_EPILOGUE_RELU_AUX_BIAS, HIPBLASLT_EPILOGUE_CLAMP_AUX_EXT, and HIPBLASLT_EPILOGUE_CLAMP_AUX_BIAS_EXT.
  • Support for HIPBLAS_COMPUTE_32F_FAST_16BF for FP32 data type for gfx950 only.
  • Added the CPP extension APIs setMaxWorkspaceBytes and getMaxWorkspaceBytes.
  • Added the ability to print logs (using HIPBLASLT_LOG_MASK=32) for Grouped GEMM.
  • Support for swizzleA by using the hipblaslt-ext cpp API.
  • Support for hipBLASLt extop for gfx11xx and gfx12xx.

Changed

  • hipblasLtMatmul() now returns an error when the workspace size is insufficient, rather than causing a segmentation fault.

Resolved issues

  • Fix incorrect results when using ldd and ldc with some solutions

hipblaslt 1.0.0 for ROCm 7.0.2

10 Oct 12:12

Choose a tag to compare

hipBLASLt code for ROCm 7.0.2 did not change. The library was rebuilt for the updated ROCm 7.0.2 stack.

hipblaslt 1.0.0 for ROCm 7.0.1

17 Sep 16:37

Choose a tag to compare

hipBLASLt code for ROCm 7.0.1 did not change. The library was rebuilt for the updated ROCm 7.0.1 stack.

hipBLASLt 1.0.0 for ROCm 7.0.0

16 Sep 06:31

Choose a tag to compare

Added

  • Stream-K GEMM support has been enabled for the FP32, FP16, BF16, FP8, and BF8 data types on the MI300A APU. To activate this feature, set the TENSILE_SOLUTION_SELECTION_METHOD environment variable to 2, for example, export TENSILE_SOLUTION_SELECTION_METHOD=2.
  • Fused Swish/SiLU GEMM in hipBLASLt (enabled by HIPBLASLT_EPILOGUE_SWISH_EXT and HIPBLASLT_EPILOGUE_SWISH_BIAS_EXT)
  • Added support for HIPBLASLT_EPILOGUE_GELU_AUX_BIAS for gfx942
  • Added HIPBLASLT_TUNING_USER_MAX_WORKSPACE to constrain max workspace size for user offline tuning
  • Added HIPBLASLT_ORDER_COL16_4R16 and HIPBLASLT_ORDER_COL16_4R8 to hipblasLtOrder_t to support FP16/BF16 swizzle GEMM and FP8/BF8 swizzle GEMM respectively.
  • Added TF32 emulation on gfx950

Changed

  • HIPBLASLT_MATMUL_DESC_A_SCALE_POINTER_VEC_EXT and HIPBLASLT_MATMUL_DESC_B_SCALE_POINTER_VEC_EXT are removed. Use the HIPBLASLT_MATMUL_DESC_A_SCALE_MODE and HIPBLASLT_MATMUL_DESC_B_SCALE_MODE attributes to set scalar (HIPBLASLT_MATMUL_MATRIX_SCALE_SCALAR_32F) or vector (HIPBLASLT_MATMUL_MATRIX_SCALE_OUTER_VEC_32F).
  • The non-V2 APIs (GemmPreference, GemmProblemType, GemmEpilogue, GemmTuning, GemmInputs) in the Cpp header are now the same as the V2 APIs (GemmPreferenceV2, GemmProblemTypeV2, GemmEpilogueV2, GemmTuningV2, GemmInputsV2). The original non-V2 APIs are removed.
  • hipblasltExtAMaxWithScale API is removed.

Optimized

  • Improved performance for 8-bit (FP8/BF8/I8) NN/NT cases by adding s_delay_alu to reduce stalls from dependent ALU operations on gfx12+.
  • Improved performance for 8-bit and 16-bit (FP16/BF16) TN cases by enabling software dependency check (Expert Scheduling Mode) under certain restrictions to reduce redundant hardware dependency checks on gfx12+.
  • Improved performance for 8-bit, 16-bit, and 32-bit batched GEMM with a better heuristic search algorithm for gfx942.

Upcoming changes

  • V2 APIs (GemmPreferenceV2, GemmProblemTypeV2, GemmEpilogueV2, GemmTuningV2, GemmInputsV2) are deprecated.

hipBLASLt 0.12.1 for ROCm 6.4.4

24 Sep 14:01
443a256

Choose a tag to compare

hipBLASLt code for ROCm 6.4.4 did not change. The library was rebuilt for the updated ROCm 6.4.4 stack.

hipBLASLt 0.12.1 for ROCm 6.4.3

07 Aug 14:20
443a256

Choose a tag to compare

hipBLASLt code for ROCm 6.4.3 did not change. The library was rebuilt for the updated ROCm 6.4.3 stack.

hipBLASLt 0.12.1 for ROCm 6.4.2

21 Jul 16:54
65cf318

Choose a tag to compare

Added

  • Support for gfx1151

hipBLASLt 0.12.1 for ROCm 6.4.1

20 May 13:15
4d62e13

Choose a tag to compare

Resolved issues

  • Fixed an accuracy issue that occurred for some solutions using an FP32 or TF32 data type with a TT transpose.