Releases: ROCm/rocBLAS
Releases · ROCm/rocBLAS
rocBLAS 4.4.1 for ROCm 6.4.2
Resolved issues
- Zero imaginary portion of diagonal of C matrix for cherk/zherk for gfx90a/gfx942 with problem sizes
k > 500
rocBLAS 4.4.0 for ROCm 6.4.1
rocBLAS code for ROCm 6.4.1 did not change. The library was rebuilt for the updated ROCm 6.4.1 stack.
rocBLAS 4.4.0 for ROCm 6.4.0
Added
- rocTX support in rocBLAS (not available on Windows or in the static library version on Linux)
- On gfx12, all functions now support full
rocblas_int
dynamic range forbatch_count
--ninja
build option- Support for GPU_TARGETS cmake variable
Changed
- rocblas-test client removes the stress tests unless YAML-based testing or
gtest_filter
adds them - rocblas clients OpenMP default threading is reduced to be less than the logical core count
gemm_ex
testing and timing reuses device memorygemm_ex
timing initializes matrices on device
Optimized
- Significantly reduced workspace memory requirements for Level 1 ILP64:
iamax
andiamin
- Reduced workspace memory requirements for Level 1 ILP64:
dot
,asum
,nrm2
- Improved the performance of Level 2 gemv for the problem sizes (
TransA == N && m > 2*n
) and (TransA == T
) - Improved the performance of Level 3 syrk and herk for the problem size (
k > 500 && n < 4000
)
Resolved issues
- gfx12:
ger
,geam
,geam_ex
,dgmm
,trmm
,symm
,hemm
, ILP64gemm
, and larger data support - Added a
gfortran
package dependency for Azure Linux OS - Outdated SLES OS package dependencies (
cxxtools
andjoblib
) ininstall.sh -d
- Code object stripping for RPM packages
Upcoming changes
- Deprecated the cmake variable
AMDGPU_TARGETS
. UseGPU_TARGETS
instead.
rocBLAS 4.3.0 for ROCm 6.3.3
rocBLAS code for ROCm 6.3.3 did not change. The library was rebuilt for the updated ROCm 6.3.3 stack.
rocBLAS 4.3.0 for ROCm 6.3.2
rocBLAS code for ROCm 6.3.2 did not change. The library was rebuilt for the updated ROCm 6.3.2 stack.
rocBLAS 4.3.0 for ROCm 6.3.1
rocBLAS code for ROCm 6.3.1 did not change. The library was rebuilt for the updated ROCm 6.3.1 stack.
rocBLAS 4.3.0 for ROCm 6.3.0
Added
- Level 3 and EX functions have an additional ILP64 API for both C and FORTRAN (_64 name suffix) with int64_t function arguments
Changed
- amdclang is used as the default compiler instead of hipcc
- Internal performance scripts use amd-smi instead of the deprecated rocm-smi
Optimized
- Improved performance of Level 2 gbmv
- Improved performance of Level 2 gemv for float and double precisions for problem sizes (TransA == N && m==n && m % 128 == 0) measured on a gfx942 GPU
Resolved issues
- Fixed stbsv_strided_batched_64 Fortran binding
Upcoming changes
- rocblas_Xgemm_kernel_name APIs are deprecated
rocBLAS 4.2.4 for ROCm 6.2.4
Additions
- GFX1151 Support
rocBLAS 4.2.1 for ROCm 6.2.2
rocBLAS code for ROCm 6.2.2 did not change. The library was rebuilt for the updated ROCm 6.2.2 stack.
rocBLAS 4.2.1 for ROCm 6.2.1
Removals
- Remove Device_Memory_Allocation.pdf link in documentation
Fixes
- Fixed error/warn message during rocblas_set_stream() call