Releases: ROCm/hipCUB
Releases · ROCm/hipCUB
hipCUB 3.1.0 for ROCm 6.1.5
hipCUB code for ROCm 6.1.5 did not change. The library was rebuilt for the updated ROCm 6.1.5 stack.
hipCUB 3.1.0 for ROCm 6.1.2
hipCUB code for ROCm 6.1.2 did not change. The library was rebuilt for the updated ROCm 6.1.2 stack.
hipCUB 3.1.0 for ROCm 6.1.1
hipCUB code for ROCm 6.1.1 did not change. The library was rebuilt for the updated ROCm 6.1.1 stack.
hipCUB 3.1.0 for ROCm 6.1.0
Changed
- CUB backend references CUB and Thrust version 2.1.0.
- Updated
HIPCUB_HOST_WARP_THREADS
macro definition to matchhost_warp_size
changes from rocPRIM 3.0. - Implemented
__int128_t
and__uint128_t
support for radix_sort.
Fixed
- Fixed build issues with
rmake.py
on Windows when using VS 2017 15.8 or later due to a breaking fix with extended aligned storage.
Added
- Added interface
DeviceMemcpy::Batched
for batched memcpy from rocPRIM and CUB.
hipCUB 3.0.0 for ROCm 6.0.2
hipCUB code for ROCm 6.0.2 did not change. The library was rebuilt for the updated ROCm 6.0.2 stack.
hipCUB 3.0.0 for ROCm 6.0.0
Changed
- Removed
DOWNLOAD_ROCPRIM
, forcing rocPRIM to download can be done withDEPENDENCIES_FORCE_DOWNLOAD
.
hipCUB 2.13.1 for ROCm 5.7.1
hipCUB code for ROCm 5.7.1 did not change. The library was rebuilt for the updated ROCm 5.7.1 stack.
hipCUB 2.13.1 for ROCm 5.7.0
Changed
- CUB backend references CUB and Thrust version 2.0.1.
- Fixed
DeviceSegmentedReduce::ArgMin
andDeviceSegmentedReduce::ArgMax
by returning the segment-relative index instead of the absolute one. - Fixed
DeviceSegmentedReduce::ArgMin
for inputs where the segment minimum is smaller than the value returned for empty segments. An equivalent fix is applied toDeviceSegmentedReduce::ArgMax
.
Known Issues
debug_synchronous
no longer works on CUDA platform.CUB_DEBUG_SYNC
should be used to enable those checks.DeviceReduce::Sum
does not compile on CUDA platform for mixed extended-floating-point/floating-point InputT and OutputT types.DeviceHistogram::HistogramEven
fails on CUDA platform for[LevelT, SampleIteratorT] = [int, int]
.DeviceHistogram::MultiHistogramEven
fails on CUDA platform for[LevelT, SampleIteratorT] = [int, int/unsigned short/float/double]
and[LevelT, SampleIteratorT] = [float, double]
.
hipCUB 2.13.1 for ROCm 5.6.1
hipCUB code for ROCm 5.6.1 did not change. The library was rebuilt for the updated ROCm 5.6.1 stack.
hipCUB 2.13.1 for ROCm 5.6.0
hipCUB code for ROCm 5.6.0 did not change. The library was rebuilt for the updated ROCm 5.6.0 stack.