Skip to content

[windows] Patch TENSILE_LIBRARY_DIR in rocBLAS and hipBLASLt. #783

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ScottTodd
Copy link
Member

Tentative fix for #681. Splitting the dist directory structure across operating systems leads to further complexity in all downstream projects and tooling. If the HIP SDK or some other project still needs the old layout, we could add a CMake option to allow users to switch between the two styles during some transition period.

Also enabled hipBLASLt tests on our Windows CI, so progress on #544.

Comment on lines -37 to -38
# Currently, this test is not working with Windows. This test will be enabled once this library has been enabled
if: ${{ inputs.platform == 'linux' }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests pass on Windows gfx1151: https://github.com/ROCm/TheRock/actions/runs/15495327375/job/43649536258?pr=783

[----------] Global test environment tear-down
[==========] 5766 tests from 2 test suites ran. (470772 ms total)
[  PASSED  ] 5766 tests.
hipBLASLt version: 100000
hipBLASLt git version: c630f202
command line: B:\actions-runner\_work\TheRock\TheRock\build\bin\hipblaslt-test.exe --gtest_filter=*pre_checkin* 

Odd checkout failures on rocBLAS and rocPRIM though:
https://github.com/ROCm/TheRock/actions/runs/15495327375/job/43649536566?pr=783

 Deleting the contents of 'B:\actions-runner\_work\TheRock\TheRock'
Error: File was unable to be removed Error: EPERM: operation not permitted, unlink 'B:\actions-runner\_work\TheRock\TheRock\build\bin\amd_comgr0605.dll'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed again on a retry: https://github.com/ROCm/TheRock/actions/runs/15495327375/job/43740951661?pr=783#step:2:28

@amd-justchen can you debug on that runner (windows-strix-halo-gpu-rocm-1)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I'll take a look! I just about am done monitoring the CI CPU builders. That error looks like the same one before. I'll be able to remote in to the machine to troubleshoot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the build is only just now producing those files, so we should see if any permission/owner bits are set differently. Might need to change how those projects are built or make our CI workflows try harder to clean up the files during startup/shutdown.

A common post-job hook for a self-hosted runner deletes the source directory. We used these on IREE: https://github.com/iree-org/iree/tree/41dcee93c7157955d94973addf6770cecf926849/build_tools/github_actions/runner/config/hooks. See pre_job.sh, cleanup_workdir.sh, and https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/running-scripts-before-or-after-a-job.

@amd-justchen
Copy link
Contributor

https://github.com/ROCm/TheRock/actions/runs/15495327375/job/43649536566?pr=783#step:2:32

Just noting this error on the selfhosted gpu test runner

Error: File was unable to be removed Error: EPERM: operation not permitted, unlink 'B:\actions-runner\_work\TheRock\TheRock\build\bin\rocblas.dll'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: TODO
Development

Successfully merging this pull request may close these issues.

3 participants