Skip to content

Develop stream 2024-07-01#512

Merged
stanleytsang-amd merged 69 commits intoROCm:developfrom
StreamHPC:develop_stream_20240701
Aug 8, 2024
Merged

Develop stream 2024-07-01#512
stanleytsang-amd merged 69 commits intoROCm:developfrom
StreamHPC:develop_stream_20240701

Conversation

@Naraenda
Copy link
Member

@Naraenda Naraenda commented Jul 1, 2024

This PR brings various updates, intended for ROCm 6.3.

It contains the following merge commits:

  • 0350e0f perf: improve performance on rocm 6.2.
  • ade3f6b fix: performance regression introduced with hip graph support.
  • c90e74c docs: add documentation for hip graph support
  • fa13dd7 feat: hip graph support for poisson distribution
  • 4224428 feat: host implementation for mt19937
  • 615fc5a perf: use alias method instead of binary search in cdf
  • 8f7e2af docs: reference GPU_TARGETS instead of AMDGPU_TARGET
  • 6807269 fix: fix host generators not always supporting large sizes
  • 50dab4f refactor: remove deprecated internal headers
  • 80c16d4 test: improve accuracy of poisson histogram test
  • 1a44b48 docs: fix styling

This PR does not contain fixes for the recent performance regressions, we'll either add them here or create a new PR for those.

@Naraenda Naraenda self-assigned this Jul 1, 2024
@Naraenda
Copy link
Member Author

Naraenda commented Jul 1, 2024

@samjwu The clang format CI step seems to not handle external PRs properly, I think this might also be happening in other repositories.

@Naraenda Naraenda marked this pull request as ready for review July 1, 2024 15:10
@Naraenda
Copy link
Member Author

Naraenda commented Jul 3, 2024

@stanleytsang-amd Let me know if I should squash the commits into their respective merge commits.

@stanleytsang-amd
Copy link
Collaborator

stanleytsang-amd commented Jul 4, 2024

@samjwu The clang format CI step seems to not handle external PRs properly, I think this might also be happening in other repositories.
@samjwu Can you take a look at the clang-format CI?

Run git fetch origin develop
From https://github.com/ROCm/rocRAND

  • branch develop -> FETCH_HEAD
  • [new branch] develop -> origin/develop
    fatal: couldn't find remote ref develop_stream_20240701
    Error: Process completed with exit code 128.

@Naraenda
Copy link
Member Author

Naraenda commented Jul 9, 2024

Rebased to get the clang-format ci fixes.

mfep and others added 15 commits July 24, 2024 07:55
rocm-docs-core distributes headers and stylesheets for doxygen for
embedding its HTML output into sphinx. These mostly fix dark-theme
and other minor visual issues when doxygen output is used this way.
docs(api reference): rocm-docs-core headers and stylesheets in doxyfile

See merge request amd/libraries/rocRAND!326
…ream'

Improve accuracy of Poisson histogram test

Closes ROCm#240

See merge request amd/libraries/rocRAND!327
…tream'

Resolve "Remove deprecated internal headers"

Closes ROCm#341

See merge request amd/libraries/rocRAND!330
AMDGPU_TARGETS doesn't pick up updates correctly (needs cache clean) whereas GPU_TARGETS does. Every other doc and CI too refers to GPU_TARGETS.
Resolve "Some host generators might not support large sizes due to min / max"

See merge request amd/libraries/rocRAND!329
mfep and others added 19 commits July 24, 2024 07:56
Recent changes required for HIP graph support added a new path with
approximation of Poisson with normal distribution when lambda is large.
However, the decision whether to use the alias/CDF methods or the
approximation is made in the kernel for every generated value even
though lambda is the same.
This change moves it to host side: depending on lambda the kernel is
launched with one of two distributions (poisson_distribution or
poisson_distribution_huge).
Resolve "Document HIP Graph support"

Closes ROCm#360

See merge request amd/libraries/rocRAND!335
…p_stream'

Fix performance regression of Poisson distribution introduced by HIP graph support

Closes ROCm#366

See merge request amd/libraries/rocRAND!336
hipcc from ROCm 6.2 does not add `-mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false`
by default.
Improve performance on ROCm 6.2

See merge request amd/libraries/rocRAND!337
ci(.gitlab-ci.yml): replace 'ROCM_PATH' variable with 'env:HIP_PATH' as the former is unintuitive

See merge request amd/libraries/rocRAND!339
Remove unused FindTestU01.cmake

See merge request amd/libraries/rocRAND!342
…te' into 'develop_stream'

Resolve "Add check for nullptr data when calling host generate."

Closes ROCm#369

See merge request amd/libraries/rocRAND!341
operator >> has higher precedence than operator &.
This bug causes very low quality in crush tests.
…ream'

Fix threefry2x64 and threefry4x64

Closes ROCm#371

See merge request amd/libraries/rocRAND!343
@Naraenda Naraenda force-pushed the develop_stream_20240701 branch from b6c8731 to d4f972c Compare July 24, 2024 09:30
@Naraenda
Copy link
Member Author

Rebased and added threefry fixes by @ex-rzr

Copy link
Collaborator

@umfranzw umfranzw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finished working my way through this one, and I think it looks good. All our CI checks are also passing now.
@stanleytsang-amd if you have no objections, I'm ok with this one going in.

Copy link
Collaborator

@stanleytsang-amd stanleytsang-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the packaging version is updated.

@stanleytsang-amd stanleytsang-amd self-requested a review August 8, 2024 16:18
@stanleytsang-amd stanleytsang-amd merged commit a296589 into ROCm:develop Aug 8, 2024
ammallya pushed a commit that referenced this pull request Oct 28, 2025
* Removed accidentally included #include "hip/amd_detail/host_defines.h"

* chore(gitignore): ignore python venvs

* docs(api reference): rocm-docs-core headers and stylesheets in doxyfile

rocm-docs-core distributes headers and stylesheets for doxygen for
embedding its HTML output into sphinx. These mostly fix dark-theme
and other minor visual issues when doxygen output is used this way.

* improve accuracy of poisson histogram test

* fix format and copyright dates

* feat(test): Added CMake option RUN_EXTRA_TESTS

* Removed deprecated internal headers, src/rng/distribution/distributions.hpp and src/rng/device_engines.hpp

* Using .lint:clang-format

* feat(test): Added large size tests for host generators

* fix(generator): Fixed the usage of min in host generators

* docs(dyn_ordering): Use GPU_TARGETS instead of AMDGPU_TARGETS

AMDGPU_TARGETS doesn't pick up updates correctly (needs cache clean) whereas GPU_TARGETS does. Every other doc and CI too refers to GPU_TARGETS.

* Use alias method in rocrand_discrete for MTGP32, LFSR113 and ThreeFry

discrete_alias is faster than discrete_cdf. Though discrete_cdf can be used with PRNGs,
it is supposed to be used with QRNGs (Sobol generators) as it maintains quasi-randomness.

* refactor mt19937 to support host version as well

* update test

* move jump_ahead_thread_count back to template param

* implement memcpy in host and device systems

* fix compilation issues and segfault

* Removed 'apt-get install flang'

* fix jump_ahead on host

* create host implementation of some functions
refactor generate_long_mt19937 to work on host as well

* fix remaining inconsistencies in host mt19937 generator

* fix format end compile errors

* fix missing gen_next_n calls
fix clang-format issues

* fix format issues and missing __host__s

* fix messed up host/device allocations

* fix merge conflicts
fix format

* fix format issues and compile error

* fix format issues

* fix format issue

* disable most mt19937 host tests for normal run (enabled for slow test run)

* fix review comments

* remove synchronization from host_system::launch function
add synchronization to host_system alloc, free and memcpy

* Implement asynchronous initialization of poisson distribution

* use ROCRAND_HIP_FATAL_ASSERT for hipDeviceSynchronize call

* generate_poisson test with many lambdas

* Testing poisson with hipGraphs

* Test [blocking] host_generator with non-blocking stream

* Fixing poisson distribution selection in benchmark_tuning

* Updated changelog

* fix(docs): Added links to unaccessible doc pages

* fix(docs): Removed duplicated CUDA Compatibility section from Programmer's guide

* Added hipGraphs doc and sample

* Fix performance regression of Poisson distribution

Recent changes required for HIP graph support added a new path with
approximation of Poisson with normal distribution when lambda is large.
However, the decision whether to use the alias/CDF methods or the
approximation is made in the kernel for every generated value even
though lambda is the same.
This change moves it to host side: depending on lambda the kernel is
launched with one of two distributions (poisson_distribution or
poisson_distribution_huge).

* clang-format: Break after attributes

* Add missing __forceinline__ to improve performance on ROCm 6.2

hipcc from ROCm 6.2 does not add `-mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false`
by default.

* Remove meaningless code in xorwow introduced during rebase/merge

* style: update formatting

* ci(.gitlab-ci.yml): replace 'ROCM_PATH' variable with 'env:HIP_PATH' as the former is unintuitive

* ci(.gitlab-ci.yml): do not force download deps on windows

* ci(.gitlab-ci.yml): pass amdclang filepath properly to windows package build test

* Remove unused FindTestU01.cmake

* Added checks for nullptr data with tests

* Fix bit rotation for threefry2x64 and threefry4x64

operator >> has higher precedence than operator &.
This bug causes very low quality in crush tests.

* chore: bump version

---------

Co-authored-by: Lőrinc Serfőző <[email protected]>
Co-authored-by: Gergely Meszaros <[email protected]>
Co-authored-by: Nol Moonen <[email protected]>
Co-authored-by: Bence Parajdi <[email protected]>
Co-authored-by: Nick Breed <[email protected]>
Co-authored-by: Anton Gorenko <[email protected]>

[ROCm/rocRAND commit: a296589]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants