Skip to content

Commit d5e7609

Browse files
stanleytsang-amdNaraendaSnektronnolmoonenGergely Meszaros
authored
Develop stream 2023 06 19 updated (#383)
* Resolve "Implement curandGetDirectionVectors32 and curandGetDirectionVectors64" * add missing ctest test files to ci artifacts * remove hiprand * review comments round 1 * Fix rocRAND CUDA build instructions * Document how to build rocRAND with a non-standard ROCm installation path * Add documentation on passing the module path when building for NVIDIA * Apply rule of 5 in benchmarks * Make mt19937_engine movable * Apply rule of 5 to rocrand.hpp This could be a breaking change for users, but any code that no longer compiles was probably buggy anyway. * Apply rule of 5 / 0 to rocrand internal types * Abort if deallocation fails in random number engines Abort instead of silently continuing when `hipFree` fails in rng destructors. `hipFree` should only fail if 1. there is something wrong with the runtime in which case continuing doesn't really make sense or 2. There is a bug in rocRAND trying to double free or free something that was not allocated with `hipMalloc`. This is usually a sign that something went pretty wrong, and probably other weird failures would follow if we tried to continue anyway. This makes a test for move construction / assignment in the next commit sound, so that it does not need to depend on implicitly terminating when device side memory is used after freeing it. * Various fixes in rocrand.hpp - Add missing order setter for rocrand_cpp::threefry4x64_20_engine - Fix typo in alias name for the mrg31k3p engine in rocrand_cpp - Fix default ordering for rocrand_cpp::lfsr113_engine * Test move operations for rocrand_cpp generators * Add copyright check script based on hipCUB * Implement device-side initialization of MT19937 * Optimize jump ahead for MT19937 * Replace % n with a simpler wrapping code; * Change the innermost loop so i < n must be checked only for the last iteration, others can be unrolled without if; * Fix block size so blockDim.x can be replaced with a constant. * Move MT19937's jump-ahead state from shared memory to registers * Update years in copyright * Add a test for MT19937's jump-ahead on device It compares the new device implementation with the old host one. mt19937_engine is used only for the test so it has been moved from the library to tests. * Move precomputed jumps to a separate compilation unit * Add polynomials for 256 ^ 2 generators There are 510 (== (256 - 1) * 2) polynomials in the table instead of 13 polynomials of power-of-two jumps. Each generator requires up to 2 iterations of Horner algorithm. Added a compressed file clist_mt19937.txt created using jump_ahead_1.02 * add workaround for sdma oversubscription * Add continuity tests for MT19937 * Add tests for heads and tails * Optimize MT19937 and add continuity support * Dynamic indexing of state's mt[] is removed; * Long sequences are generated mostly with fast unrolled loops; * Short sequences can be generated by a special kernel when possible; * Optimize MT19937 for small sizes mt19937_octo_engine is stored as uints. The kernel for short sequences loads (coalesced access) values much faster because they are stored consequentially now. * Improve comments with implementation details * Add tests for distribution changing * Combine engine loading/saving routines into accessor class * remove reliance on manually-specified WIN32 define * Remove references to HCC * copyright fixes --------- Co-authored-by: Nara Prasetya <[email protected]> Co-authored-by: Robin Voetter <[email protected]> Co-authored-by: Nol Moonen <[email protected]> Co-authored-by: Gergely Meszaros <[email protected]> Co-authored-by: Anton Gorenko <[email protected]> Co-authored-by: Bálint Soproni <[email protected]> Co-authored-by: Beatriz Navidad Vilches <[email protected]>
1 parent 2ddb533 commit d5e7609

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+42545
-1359
lines changed

Diff for: .clang-format

+1-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ AllowShortLoopsOnASingleLine: false
4141
AlwaysBreakAfterReturnType: None
4242
AlwaysBreakBeforeMultilineStrings: false
4343
AlwaysBreakTemplateDeclarations: Yes
44-
AttributeMacros: ['HIPRANDAPI', 'QUALIFIERS', 'FQUALIFIERS']
44+
AttributeMacros: ['QUALIFIERS', 'FQUALIFIERS']
4545
BinPackArguments: false
4646
BinPackParameters: false
4747
BitFieldColonSpacing: Both

Diff for: .githooks/pre-commit

+19
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,27 @@
33
# Redirect output to stderr.
44
exec 1>&2
55

6+
check_failed=false
7+
68
# Do the code format check
79
if ! "$(git rev-parse --show-toplevel)/scripts/code-format/check-format.sh" HEAD --cached 1>&2; then
10+
printf "\n\033[31mFailed\033[0m: code format check.\n"
11+
check_failed=true
12+
fi
13+
14+
# Do the copyright check
15+
# update & apply copyright when hook config is set, otherwise just verify
16+
opts="-qc"
17+
if [ "$(git config --get --type bool --default false hooks.updateCopyright)" = "true" ]; then
18+
opts="-qca"
19+
fi
20+
21+
if ! "$(git rev-parse --show-toplevel)/scripts/copyright-date/check-copyright.sh" "$opts" 1>&2; then
22+
printf "\n\033[31mFailed\033[0m: copyright date check.\n"
23+
check_failed=true
24+
fi
25+
26+
if $check_failed; then
827
printf "
928
Pre-commit check failed, please fix the reported errors.
1029
Note: Use '\033[33mgit commit --no-verify\033[0m' to bypass checks.\n"

Diff for: .gitlab-ci.yml

+20-5
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MIT License
22
#
3-
# Copyright (c) 2017-2022 Advanced Micro Devices, Inc. All rights reserved.
3+
# Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved.
44
#
55
# Permission is hereby granted, free of charge, to any person obtaining a copy
66
# of this software and associated documentation files (the "Software"), to deal
@@ -60,6 +60,20 @@ clang-format:
6060
- git config --global --add safe.directory $CI_PROJECT_DIR
6161
- scripts/code-format/check-format.sh $CI_MERGE_REQUEST_DIFF_BASE_SHA --binary "$CLANG_FORMAT"
6262

63+
copyright-date:
64+
extends:
65+
- .deps:rocm
66+
stage: lint
67+
needs: []
68+
tags:
69+
- rocm-build
70+
rules:
71+
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
72+
script:
73+
- cd $CI_PROJECT_DIR
74+
- git config --global --add safe.directory $CI_PROJECT_DIR
75+
- scripts/copyright-date/check-copyright.sh -v -d $CI_MERGE_REQUEST_DIFF_BASE_SHA
76+
6377
.rocm:cmake-minimum:
6478
variables:
6579
COMPILER: hipcc
@@ -198,7 +212,6 @@ clang-format:
198212
-D BUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF
199213
-D BUILD_SHARED_LIBS=${BUILD_SHARED_LIBS}
200214
-D USE_HIP_CPU=ON
201-
-D BUILD_HIPRAND=OFF
202215
-D CMAKE_INSTALL_PREFIX=${ROCRAND_DIR}
203216
- cmake --build $CI_PROJECT_DIR/build
204217
- cmake --build $CI_PROJECT_DIR/build --target package
@@ -210,6 +223,7 @@ clang-format:
210223
- $CI_PROJECT_DIR/build/library/
211224
- $CI_PROJECT_DIR/build/test/test_*
212225
- $CI_PROJECT_DIR/build/test/CTestTestfile.cmake
226+
- $CI_PROJECT_DIR/build/test/linkage/CTestTestfile.cmake
213227
- $CI_PROJECT_DIR/build/benchmark/benchmark_*
214228
- $CI_PROJECT_DIR/build/deps/googlebenchmark/
215229
- $CI_PROJECT_DIR/build/deps/hip-cpu/
@@ -331,7 +345,10 @@ build-cpu:cmake-latest:
331345
stage: test
332346
script:
333347
- cd $CI_PROJECT_DIR/build
334-
- ctest --output-on-failure
348+
# Parallel execution (with other AMDGPU processes) can oversubscribe the SDMA queue.
349+
# This causes the hipMemcpy to fail, which is not reported as an error by HIP.
350+
# As a temporary workaround, disable the SDMA for test stability.
351+
- HSA_ENABLE_SDMA=0 ctest --output-on-failure
335352

336353
test:rocm:
337354
needs:
@@ -462,7 +479,6 @@ test:nvcc-python:
462479
-B $CI_PROJECT_DIR/build_package_test
463480
-D CMAKE_CXX_COMPILER=${COMPILER}
464481
-D USE_HIP_CPU=ON
465-
-D BUILD_HIPRAND=OFF
466482
-D TBB_DIR=$CI_PROJECT_DIR/build/tbb-src/cmake
467483
-D hip_cpu_rt_DIR=$CI_PROJECT_DIR/build/deps/hip-cpu/share/hip_cpu_rt/cmake
468484
-D rocrand_DIR=${ROCRAND_DIR}
@@ -532,7 +548,6 @@ test:hip-cpu-package:
532548
-D BUILD_FILE_REORG_BACKWARD_COMPATIBILITY=OFF
533549
-D CMAKE_CXX_COMPILER=${COMPILER}
534550
-D USE_HIP_CPU=ON
535-
-D BUILD_HIPRAND=OFF
536551
-D DEPENDENCIES_FORCE_DOWNLOAD=ON
537552
-D CMAKE_INSTALL_PREFIX=${ROCRAND_DIR}
538553
- $SUDO_CMD cmake --build $CI_PROJECT_DIR/build_only_install --target install

Diff for: .gitmodules

-4
This file was deleted.

Diff for: CHANGELOG.md

+30
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,42 @@
11
# Change Log for rocRAND
22

33
Full documentation for rocRAND is available at [https://rocrand.readthedocs.io/en/latest/](https://rocrand.readthedocs.io/en/latest/)
4+
5+
## (Unreleased) rocRAND-x.x.x for ROCm 6.0.0
6+
### Changed
7+
- Removed hipRAND submodule from rocRAND. hipRAND is now only available as a separate package.
8+
- Generator classes from `rocrand.hpp` are no longer copyable, in previous versions these copies
9+
would copy internal references to the generators and would lead to double free or memory leak errors.
10+
These types should be moved instead of copied, and move constructors and operators are now defined
11+
for them.
12+
- Improved MT19937 initialization and generation performance.
13+
- Removed references to and workarounds for deprecated hcc
14+
15+
### Fixed
16+
- `mt19937_engine` from `rocrand.hpp` is now move-constructible and move-assignable. Previously the
17+
move constructor and move assignment operator was deleted for this class.
18+
- Various fixes for the C++ wrapper header rocrand.hpp
19+
- fixed the name of `mrg31k3p` it is now correctly spelled (was incorrectly named`mrg31k3a` in
20+
previous versions).
21+
- added missing `order` setter method for `threefry4x64`
22+
- fixed the default ordering parameter for `lfsr113`
23+
424
## (Unreleased) rocRAND-2.10.17 for ROCm 5.5.0
525
### Added
626
- MT19937 pseudo random number generator based on M. Matsumoto and T. Nishimura, 1998, Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator.
727
- New benchmark for the device API using Google Benchmark, `benchmark_rocrand_device_api`, replacing `benchmark_rocrand_kernel`. `benchmark_rocrand_kernel` is deprecated and will be removed in a future version. Likewise, `benchmark_curand_host_api` is added to replace `benchmark_curand_generate` and `benchmark_curand_device_api` is added to replace `benchmark_curand_kernel`.
828
- experimental HIP-CPU feature
929
- ThreeFry pseudorandom number generator based on Salmon et al., 2011, "Parallel random numbers: as easy as 1, 2, 3".
30+
- Accessor methods for sobol 32 and 64 direction vectors and constants:
31+
- Enum `rocrand_direction_vector_set` to select the direction vector set.
32+
- `rocrand_get_direction_vectors32(...)` supersedes:
33+
- `rocrand_h_sobol32_direction_vectors`
34+
- `rocrand_h_scrambled_sobol32_direction_vectors`
35+
- `rocrand_get_direction_vectors64(...)` supersedes:
36+
- `rocrand_h_sobol64_direction_vectors`
37+
- `rocrand_h_scrambled_sobol64_direction_vectors`
38+
- `rocrand_get_scramble_constants32(...)` supersedes `h_scrambled_sobol32_constants`
39+
- `rocrand_get_scramble_constants64(...)` supersedes `h_scrambled_sobol64_constants`
1040
### Changed
1141
- Python 2.7 is no longer officially supported.
1242

Diff for: CMakeLists.txt

+3-22
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MIT License
22
#
3-
# Copyright (c) 2018-2022 Advanced Micro Devices, Inc. All rights reserved.
3+
# Copyright (c) 2018-2023 Advanced Micro Devices, Inc. All rights reserved.
44
#
55
# Permission is hereby granted, free of charge, to any person obtaining a copy
66
# of this software and associated documentation files (the "Software"), to deal
@@ -104,7 +104,7 @@ if (AMDGPU_TARGETS)
104104
set( AMDGPU_TARGETS "${gpus}" CACHE STRING "AMD GPU targets to compile for" FORCE )
105105
endif()
106106

107-
# Verify that hcc compiler is used on ROCM platform
107+
# Verify that hipcc compiler is used on ROCM platform
108108
# TODO: Fix VerifyCompiler for Windows
109109
if(NOT USE_HIP_CPU)
110110
if(NOT WIN32)
@@ -173,9 +173,7 @@ print_configuration_summary()
173173
# Tools
174174
add_subdirectory(tools)
175175

176-
# rocRAND and hipRAND libraries
177-
# This adds library targets: rocrand, hiprand,
178-
# also includes Fortran wrapper
176+
# rocRAND library, adds library target "rocrand" and includes Fortran wrapper
179177
add_subdirectory(library)
180178

181179
if(BUILD_TEST OR BUILD_BENCHMARK)
@@ -229,23 +227,6 @@ endif()
229227

230228
set(ROCRAND_CONFIG_DIR "\${CPACK_PACKAGING_INSTALL_PREFIX}/${CMAKE_INSTALL_LIBDIR}" CACHE PATH "Path placed into ldconfig file")
231229

232-
if(WIN32)
233-
option(BUILD_HIPRAND "Build hipRAND as part of rocRAND" OFF)
234-
else()
235-
option(BUILD_HIPRAND "Build hipRAND as part of rocRAND" ON)
236-
endif()
237-
if(BUILD_HIPRAND)
238-
if(HIP_COMPILER STREQUAL "nvcc")
239-
set(BUILD_WITH_LIB "CUDA")
240-
endif()
241-
if(EXISTS ${CMAKE_SOURCE_DIR}/hipRAND/CMakeLists.txt)
242-
set(HIPRAND_SUBMODULE ON CACHE BOOL "Build hipRAND as a submodule of rocRAND." FORCE)
243-
add_subdirectory(${CMAKE_SOURCE_DIR}/hipRAND ${CMAKE_BINARY_DIR}/hipRAND)
244-
else()
245-
message(FATAL_ERROR "BUILD_HIPRAND was set, but the hipRAND submodule could not be found. Use git submodule update --init to clone the hipRAND submodule, or set BUILD_HIPRAND to OFF (--no-hiprand flag for ./install)")
246-
endif()
247-
endif()
248-
249230
rocm_create_package(
250231
NAME ${package_name}
251232
DESCRIPTION "Radeon Open Compute RAND library"

Diff for: README.md

+13-11
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ programming language and optimised for AMD's latest discrete GPUs. It is designe
99
of AMD's Radeon Open Compute [ROCm](https://rocm.github.io/) runtime, but it also works on
1010
CUDA enabled GPUs.
1111

12-
Prior to ROCm version 5.0, this project included the [hipRAND](https://github.com/ROCmSoftwarePlatform/hipRAND.git) wrapper. As of version 5.0, this has been split into a separate library.
12+
Prior to ROCm version 5.0, this project included the [hipRAND](https://github.com/ROCmSoftwarePlatform/hipRAND.git) wrapper. As of version 5.0, this has been split into a separate library. As of version 6.0, hipRAND can no longer be built from rocRAND.
1313

1414
## Supported Random Number Generators
1515

@@ -89,23 +89,26 @@ cd rocRAND; mkdir build; cd build
8989

9090
# Configure rocRAND, setup options for your system
9191
# Build options: BUILD_TEST (off by default), BUILD_BENCHMARK (off by default), BUILD_SHARED_LIBS (on by default)
92+
# Additionally, the ROCm installation prefix should be passed using CMAKE_PREFIX_PATH or by setting the ROCM_PATH environment variable.
9293
#
9394
# ! IMPORTANT !
9495
# Set C++ compiler to HIP-clang. You can do it by adding 'CXX=<path-to-compiler>'
9596
# before 'cmake' or setting cmake option 'CMAKE_CXX_COMPILER' to path to the compiler.
9697
#
9798
# The python interface do not work with static library.
9899
#
99-
[CXX=hipcc] cmake -DBUILD_BENCHMARK=ON ../. # or cmake-gui ../.
100+
[CXX=hipcc] cmake -DBUILD_BENCHMARK=ON ../. -DCMAKE_PREFIX_PATH=/opt/rocm # or cmake-gui ../.
100101

101-
# To configure rocRAND for Nvidia platforms, 'CXX=<path-to-nvcc>', `CXX=nvcc` or omitting the flag
102-
# entirely before 'cmake' is sufficient
103-
[CXX=nvcc] cmake -DBUILD_BENCHMARK=ON ../. # or cmake-gui ../.
102+
# To configure rocRAND for NVIDIA platforms, the CXX compiler must be set to a host compiler. The CUDA compiler can
103+
# be set explicitly using `-DCMAKE_CUDA_COMPILER=<path-to-nvcc>`.
104+
# Additionally, the path to FindHIP.cmake should be passed via CMAKE_MODULE_PATH. By default, this is module is
105+
# installed in /opt/rocm/hip/cmake.
106+
cmake -DBUILD_BENCHMARK=ON ../. -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_MODULE_PATH=/opt/rocm/hip/cmake # or cmake-gui ../.
104107
# or
105-
cmake -DBUILD_BENCHMARK=ON ../. # or cmake-gui ../.
108+
[CXX=g++] cmake -DBUILD_BENCHMARK=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_MODULE_PATH=/opt/rocm/hip/cmake ../. # or cmake-gui ../.
106109

107-
# To configure rocRAND for HIP-CPU (experimental), the USE_HIP_CPU flag is required and BUILD_HIPRAND should be turned off
108-
[CXX=g++] cmake -DUSE_HIP_CPU=ON -DBUILD_HIPRAND=OFF -DBUILD_BENCHMARK=ON ../. # or cmake-gui ../.
110+
# To configure rocRAND for HIP-CPU (experimental), the USE_HIP_CPU flag is required
111+
[CXX=g++] cmake -DUSE_HIP_CPU=ON -DBUILD_BENCHMARK=ON -DCMAKE_PREFIX_PATH=/opt/rocm ../. # or cmake-gui ../.
109112

110113
# Build
111114
make -j4
@@ -231,10 +234,9 @@ cd rocRAND; cd build
231234

232235
## Wrappers
233236

234-
* C++ wrappers for host API of rocRAND and hipRAND are in files [`rocrand.hpp`](./library/include/rocrand/rocrand.hpp)
235-
and [`hiprand.hpp`](./library/include/rocrand/hiprand.hpp).
237+
* C++ wrappers for host API of rocRAND are in [`rocrand.hpp`](./library/include/rocrand/rocrand.hpp).
236238
* [Fortran wrappers](./library/src/fortran/).
237-
* [Python wrappers](./python/): [rocRAND](./python/rocrand) and [hipRAND](./python/hiprand).
239+
* [Python wrappers](./python/): [rocRAND](./python/rocrand).
238240

239241
## Support
240242

Diff for: benchmark/benchmark_curand_kernel.cpp

+31-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
// Copyright (c) 2017-2022 Advanced Micro Devices, Inc. All rights reserved.
1+
// Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved.
22
//
33
// Permission is hereby granted, free of charge, to any person obtaining a copy
44
// of this software and associated documentation files (the "Software"), to deal
@@ -117,6 +117,11 @@ struct runner
117117
CUDA_CALL(cudaDeviceSynchronize());
118118
}
119119

120+
runner(const runner&) = delete;
121+
runner(runner&&) = delete;
122+
runner& operator=(const runner&) = delete;
123+
runner& operator=(runner&&) = delete;
124+
120125
~runner()
121126
{
122127
CUDA_CALL(cudaFree(states));
@@ -190,6 +195,11 @@ struct runner<curandStateMtgp32_t>
190195
CURAND_CALL(curandMakeMTGP32KernelState(states, mtgp32dc_params_fast_11213, d_param, states_size, seed));
191196
}
192197

198+
runner(const runner&) = delete;
199+
runner(runner&&) = delete;
200+
runner& operator=(const runner&) = delete;
201+
runner& operator=(runner&&) = delete;
202+
193203
~runner()
194204
{
195205
CUDA_CALL(cudaFree(states));
@@ -297,6 +307,11 @@ struct runner<curandStateSobol32_t>
297307
CUDA_CALL(cudaFree(directions));
298308
}
299309

310+
runner(const runner&) = delete;
311+
runner(runner&&) = delete;
312+
runner& operator=(const runner&) = delete;
313+
runner& operator=(runner&&) = delete;
314+
300315
~runner()
301316
{
302317
CUDA_CALL(cudaFree(states));
@@ -369,6 +384,11 @@ struct runner<curandStateScrambledSobol32_t>
369384
CUDA_CALL(cudaFree(scramble_constants));
370385
}
371386

387+
runner(const runner&) = delete;
388+
runner(runner&&) = delete;
389+
runner& operator=(const runner&) = delete;
390+
runner& operator=(runner&&) = delete;
391+
372392
~runner()
373393
{
374394
CUDA_CALL(cudaFree(states));
@@ -425,6 +445,11 @@ struct runner<curandStateSobol64_t>
425445
CUDA_CALL(cudaFree(directions));
426446
}
427447

448+
runner(const runner&) = delete;
449+
runner(runner&&) = delete;
450+
runner& operator=(const runner&) = delete;
451+
runner& operator=(runner&&) = delete;
452+
428453
~runner()
429454
{
430455
CUDA_CALL(cudaFree(states));
@@ -496,6 +521,11 @@ struct runner<curandStateScrambledSobol64_t>
496521
CUDA_CALL(cudaFree(scramble_constants));
497522
}
498523

524+
runner(const runner&) = delete;
525+
runner(runner&&) = delete;
526+
runner& operator=(const runner&) = delete;
527+
runner& operator=(runner&&) = delete;
528+
499529
~runner()
500530
{
501531
CUDA_CALL(cudaFree(states));

0 commit comments

Comments
 (0)