Develop stream 2024-07-01#512
Conversation
|
@samjwu The clang format CI step seems to not handle external PRs properly, I think this might also be happening in other repositories. |
|
@stanleytsang-amd Let me know if I should squash the commits into their respective merge commits. |
Run git fetch origin develop
|
4ccc1c5 to
b6c8731
Compare
|
Rebased to get the clang-format ci fixes. |
rocm-docs-core distributes headers and stylesheets for doxygen for embedding its HTML output into sphinx. These mostly fix dark-theme and other minor visual issues when doxygen output is used this way.
docs(api reference): rocm-docs-core headers and stylesheets in doxyfile See merge request amd/libraries/rocRAND!326
…ream' Improve accuracy of Poisson histogram test Closes ROCm#240 See merge request amd/libraries/rocRAND!327
…ns.hpp and src/rng/device_engines.hpp
…tream' Resolve "Remove deprecated internal headers" Closes ROCm#341 See merge request amd/libraries/rocRAND!330
AMDGPU_TARGETS doesn't pick up updates correctly (needs cache clean) whereas GPU_TARGETS does. Every other doc and CI too refers to GPU_TARGETS.
Resolve "Some host generators might not support large sizes due to min / max" See merge request amd/libraries/rocRAND!329
Recent changes required for HIP graph support added a new path with approximation of Poisson with normal distribution when lambda is large. However, the decision whether to use the alias/CDF methods or the approximation is made in the kernel for every generated value even though lambda is the same. This change moves it to host side: depending on lambda the kernel is launched with one of two distributions (poisson_distribution or poisson_distribution_huge).
Resolve "Document HIP Graph support" Closes ROCm#360 See merge request amd/libraries/rocRAND!335
…p_stream' Fix performance regression of Poisson distribution introduced by HIP graph support Closes ROCm#366 See merge request amd/libraries/rocRAND!336
hipcc from ROCm 6.2 does not add `-mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false` by default.
Improve performance on ROCm 6.2 See merge request amd/libraries/rocRAND!337
…as the former is unintuitive
ci(.gitlab-ci.yml): replace 'ROCM_PATH' variable with 'env:HIP_PATH' as the former is unintuitive See merge request amd/libraries/rocRAND!339
Remove unused FindTestU01.cmake See merge request amd/libraries/rocRAND!342
…te' into 'develop_stream' Resolve "Add check for nullptr data when calling host generate." Closes ROCm#369 See merge request amd/libraries/rocRAND!341
operator >> has higher precedence than operator &. This bug causes very low quality in crush tests.
…ream' Fix threefry2x64 and threefry4x64 Closes ROCm#371 See merge request amd/libraries/rocRAND!343
b6c8731 to
d4f972c
Compare
|
Rebased and added threefry fixes by @ex-rzr |
umfranzw
left a comment
There was a problem hiding this comment.
I've finished working my way through this one, and I think it looks good. All our CI checks are also passing now.
@stanleytsang-amd if you have no objections, I'm ok with this one going in.
stanleytsang-amd
left a comment
There was a problem hiding this comment.
LGTM once the packaging version is updated.
* Removed accidentally included #include "hip/amd_detail/host_defines.h" * chore(gitignore): ignore python venvs * docs(api reference): rocm-docs-core headers and stylesheets in doxyfile rocm-docs-core distributes headers and stylesheets for doxygen for embedding its HTML output into sphinx. These mostly fix dark-theme and other minor visual issues when doxygen output is used this way. * improve accuracy of poisson histogram test * fix format and copyright dates * feat(test): Added CMake option RUN_EXTRA_TESTS * Removed deprecated internal headers, src/rng/distribution/distributions.hpp and src/rng/device_engines.hpp * Using .lint:clang-format * feat(test): Added large size tests for host generators * fix(generator): Fixed the usage of min in host generators * docs(dyn_ordering): Use GPU_TARGETS instead of AMDGPU_TARGETS AMDGPU_TARGETS doesn't pick up updates correctly (needs cache clean) whereas GPU_TARGETS does. Every other doc and CI too refers to GPU_TARGETS. * Use alias method in rocrand_discrete for MTGP32, LFSR113 and ThreeFry discrete_alias is faster than discrete_cdf. Though discrete_cdf can be used with PRNGs, it is supposed to be used with QRNGs (Sobol generators) as it maintains quasi-randomness. * refactor mt19937 to support host version as well * update test * move jump_ahead_thread_count back to template param * implement memcpy in host and device systems * fix compilation issues and segfault * Removed 'apt-get install flang' * fix jump_ahead on host * create host implementation of some functions refactor generate_long_mt19937 to work on host as well * fix remaining inconsistencies in host mt19937 generator * fix format end compile errors * fix missing gen_next_n calls fix clang-format issues * fix format issues and missing __host__s * fix messed up host/device allocations * fix merge conflicts fix format * fix format issues and compile error * fix format issues * fix format issue * disable most mt19937 host tests for normal run (enabled for slow test run) * fix review comments * remove synchronization from host_system::launch function add synchronization to host_system alloc, free and memcpy * Implement asynchronous initialization of poisson distribution * use ROCRAND_HIP_FATAL_ASSERT for hipDeviceSynchronize call * generate_poisson test with many lambdas * Testing poisson with hipGraphs * Test [blocking] host_generator with non-blocking stream * Fixing poisson distribution selection in benchmark_tuning * Updated changelog * fix(docs): Added links to unaccessible doc pages * fix(docs): Removed duplicated CUDA Compatibility section from Programmer's guide * Added hipGraphs doc and sample * Fix performance regression of Poisson distribution Recent changes required for HIP graph support added a new path with approximation of Poisson with normal distribution when lambda is large. However, the decision whether to use the alias/CDF methods or the approximation is made in the kernel for every generated value even though lambda is the same. This change moves it to host side: depending on lambda the kernel is launched with one of two distributions (poisson_distribution or poisson_distribution_huge). * clang-format: Break after attributes * Add missing __forceinline__ to improve performance on ROCm 6.2 hipcc from ROCm 6.2 does not add `-mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false` by default. * Remove meaningless code in xorwow introduced during rebase/merge * style: update formatting * ci(.gitlab-ci.yml): replace 'ROCM_PATH' variable with 'env:HIP_PATH' as the former is unintuitive * ci(.gitlab-ci.yml): do not force download deps on windows * ci(.gitlab-ci.yml): pass amdclang filepath properly to windows package build test * Remove unused FindTestU01.cmake * Added checks for nullptr data with tests * Fix bit rotation for threefry2x64 and threefry4x64 operator >> has higher precedence than operator &. This bug causes very low quality in crush tests. * chore: bump version --------- Co-authored-by: Lőrinc Serfőző <[email protected]> Co-authored-by: Gergely Meszaros <[email protected]> Co-authored-by: Nol Moonen <[email protected]> Co-authored-by: Bence Parajdi <[email protected]> Co-authored-by: Nick Breed <[email protected]> Co-authored-by: Anton Gorenko <[email protected]> [ROCm/rocRAND commit: a296589]
This PR brings various updates, intended for ROCm 6.3.
It contains the following merge commits:
GPU_TARGETSinstead ofAMDGPU_TARGETThis PR does not contain fixes for the recent performance regressions, we'll either add them here or create a new PR for those.