Skip to content

Releases: ROCm/hipCUB

hipCUB 3.1.0 for ROCm 6.1.5

12 Mar 18:30
44aa2e0
Compare
Choose a tag to compare

hipCUB code for ROCm 6.1.5 did not change. The library was rebuilt for the updated ROCm 6.1.5 stack.

hipCUB 3.1.0 for ROCm 6.1.2

04 Jun 16:53
44aa2e0
Compare
Choose a tag to compare

hipCUB code for ROCm 6.1.2 did not change. The library was rebuilt for the updated ROCm 6.1.2 stack.

hipCUB 3.1.0 for ROCm 6.1.1

08 May 17:59
44aa2e0
Compare
Choose a tag to compare

hipCUB code for ROCm 6.1.1 did not change. The library was rebuilt for the updated ROCm 6.1.1 stack.

hipCUB 3.1.0 for ROCm 6.1.0

16 Apr 19:09
44aa2e0
Compare
Choose a tag to compare

Changed

  • CUB backend references CUB and Thrust version 2.1.0.
  • Updated HIPCUB_HOST_WARP_THREADS macro definition to match host_warp_size changes from rocPRIM 3.0.
  • Implemented __int128_t and __uint128_t support for radix_sort.

Fixed

  • Fixed build issues with rmake.py on Windows when using VS 2017 15.8 or later due to a breaking fix with extended aligned storage.

Added

  • Added interface DeviceMemcpy::Batched for batched memcpy from rocPRIM and CUB.

hipCUB 3.0.0 for ROCm 6.0.2

31 Jan 20:12
761fccb
Compare
Choose a tag to compare

hipCUB code for ROCm 6.0.2 did not change. The library was rebuilt for the updated ROCm 6.0.2 stack.

hipCUB 3.0.0 for ROCm 6.0.0

15 Dec 18:30
761fccb
Compare
Choose a tag to compare

Changed

  • Removed DOWNLOAD_ROCPRIM, forcing rocPRIM to download can be done with DEPENDENCIES_FORCE_DOWNLOAD.

hipCUB 2.13.1 for ROCm 5.7.1

13 Oct 18:57
Compare
Choose a tag to compare

hipCUB code for ROCm 5.7.1 did not change. The library was rebuilt for the updated ROCm 5.7.1 stack.

hipCUB 2.13.1 for ROCm 5.7.0

15 Sep 17:29
Compare
Choose a tag to compare

Changed

  • CUB backend references CUB and Thrust version 2.0.1.
  • Fixed DeviceSegmentedReduce::ArgMin and DeviceSegmentedReduce::ArgMax by returning the segment-relative index instead of the absolute one.
  • Fixed DeviceSegmentedReduce::ArgMin for inputs where the segment minimum is smaller than the value returned for empty segments. An equivalent fix is applied to DeviceSegmentedReduce::ArgMax.

Known Issues

  • debug_synchronous no longer works on CUDA platform. CUB_DEBUG_SYNC should be used to enable those checks.
  • DeviceReduce::Sum does not compile on CUDA platform for mixed extended-floating-point/floating-point InputT and OutputT types.
  • DeviceHistogram::HistogramEven fails on CUDA platform for [LevelT, SampleIteratorT] = [int, int].
  • DeviceHistogram::MultiHistogramEven fails on CUDA platform for [LevelT, SampleIteratorT] = [int, int/unsigned short/float/double] and [LevelT, SampleIteratorT] = [float, double].

hipCUB 2.13.1 for ROCm 5.6.1

29 Aug 20:11
dd26e47
Compare
Choose a tag to compare

hipCUB code for ROCm 5.6.1 did not change. The library was rebuilt for the updated ROCm 5.6.1 stack.

hipCUB 2.13.1 for ROCm 5.6.0

28 Jun 23:19
dd26e47
Compare
Choose a tag to compare

hipCUB code for ROCm 5.6.0 did not change. The library was rebuilt for the updated ROCm 5.6.0 stack.