-
Notifications
You must be signed in to change notification settings - Fork 34
[windows] Patch TENSILE_LIBRARY_DIR in rocBLAS and hipBLASLt. #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…dows-blas-library-dirs
# Currently, this test is not working with Windows. This test will be enabled once this library has been enabled | ||
if: ${{ inputs.platform == 'linux' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests pass on Windows gfx1151: https://github.com/ROCm/TheRock/actions/runs/15495327375/job/43649536258?pr=783
[----------] Global test environment tear-down
[==========] 5766 tests from 2 test suites ran. (470772 ms total)
[ PASSED ] 5766 tests.
hipBLASLt version: 100000
hipBLASLt git version: c630f202
command line: B:\actions-runner\_work\TheRock\TheRock\build\bin\hipblaslt-test.exe --gtest_filter=*pre_checkin*
Odd checkout failures on rocBLAS and rocPRIM though:
https://github.com/ROCm/TheRock/actions/runs/15495327375/job/43649536566?pr=783
Deleting the contents of 'B:\actions-runner\_work\TheRock\TheRock'
Error: File was unable to be removed Error: EPERM: operation not permitted, unlink 'B:\actions-runner\_work\TheRock\TheRock\build\bin\amd_comgr0605.dll'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed again on a retry: https://github.com/ROCm/TheRock/actions/runs/15495327375/job/43740951661?pr=783#step:2:28
@amd-justchen can you debug on that runner (windows-strix-halo-gpu-rocm-1
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I'll take a look! I just about am done monitoring the CI CPU builders. That error looks like the same one before. I'll be able to remote in to the machine to troubleshoot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect the build is only just now producing those files, so we should see if any permission/owner bits are set differently. Might need to change how those projects are built or make our CI workflows try harder to clean up the files during startup/shutdown.
A common post-job hook for a self-hosted runner deletes the source directory. We used these on IREE: https://github.com/iree-org/iree/tree/41dcee93c7157955d94973addf6770cecf926849/build_tools/github_actions/runner/config/hooks. See pre_job.sh
, cleanup_workdir.sh
, and https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/running-scripts-before-or-after-a-job.
https://github.com/ROCm/TheRock/actions/runs/15495327375/job/43649536566?pr=783#step:2:32 Just noting this error on the selfhosted gpu test runner Error: File was unable to be removed Error: EPERM: operation not permitted, unlink 'B:\actions-runner\_work\TheRock\TheRock\build\bin\rocblas.dll' |
Tentative fix for #681. Splitting the dist directory structure across operating systems leads to further complexity in all downstream projects and tooling. If the HIP SDK or some other project still needs the old layout, we could add a CMake option to allow users to switch between the two styles during some transition period.
Also enabled hipBLASLt tests on our Windows CI, so progress on #544.