Skip to content

Conversation

djramic
Copy link
Contributor

@djramic djramic commented Sep 17, 2025

Motivation

Preparing CI Dockerfiles for rocm 7.0

Technical Details

  • ROCm version changed form 6.4.2 to 7.0
  • Update dependency: use graphics instead of amdgpu (libdrm-amdgpu-dev is no longer packaged under amdgpu)

Test Plan

Manually tested

@djramic djramic requested a review from causten as a code owner September 17, 2025 17:09
@djramic djramic requested review from stefankoncarevic and umangyadav and removed request for causten September 17, 2025 17:09
Copy link

codecov bot commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1991      +/-   ##
===========================================
+ Coverage    79.50%   79.76%   +0.26%     
===========================================
  Files          100      102       +2     
  Lines        31016    32361    +1345     
  Branches      4819     5104     +285     
===========================================
+ Hits         24659    25811    +1152     
- Misses        4245     4340      +95     
- Partials      2112     2210      +98     
Flag Coverage Δ
mfma ?
navi4x 79.76% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 54 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


# Use the rocm/mlir image as base
FROM rocm/mlir:rocm6.4-latest
FROM rocm/mlir:rocm7.0-latest
Copy link
Contributor

@pabloantoniom pabloantoniom Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why this change is needed? Buildbot is actually stuck in 6.3, it's not even in 6.4. More background here: https://github.com/ROCm/rocMLIR-internal/issues/1979#issuecomment-3270870992

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess he replaced all rocm6.4 found in the project. Let's a add a comment here saying this should stay in 6.3 to avoid this happening in the future.

Copy link
Contributor

@pabloantoniom pabloantoniom Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in conclusion I think this should be 6.3 and, at some point, should be updated to 7.0, for which we will need a newer machine to run the buildbot, see https://github.com/ROCm/rocMLIR-internal/issues/1981, so adding a comment here makes sense to me 👍

Copy link
Contributor

@pabloantoniom pabloantoniom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should discuss if changes on the buildbot Dockerfile are needed before merging this

@@ -1,9 +1,9 @@
FROM ubuntu:22.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to use 24.04?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure though if that would lead to different python version and trigger more changes.

@dhernandez0
Copy link
Contributor

can we update dockerImage() and similar functions in Jenkins files?

@dhernandez0
Copy link
Contributor

dhernandez0 commented Sep 18, 2025

also, I think perfRunner.py and tuningRunner.py might have some issues. rocprofv3 needs "-f csv" at least on MI350, so I suspect it's needed for rocm7.

@djramic
Copy link
Contributor Author

djramic commented Sep 18, 2025

can we update dockerImage() and similar functions in Jenkins files?

I'll cover this in the next PR once the Docker image has been created and pushed

@dhernandez0
Copy link
Contributor

I think we should run nightly or weekly CI?

@djramic
Copy link
Contributor Author

djramic commented Sep 19, 2025

I think we should run nightly or weekly CI?

The build image job gets triggered after merging a PR that modifies the Dockerfile, so this one needs to be merged first. I'll add these checks in a new PR with Jenkinsfile changes.

@umangyadav umangyadav merged commit e174a10 into develop Sep 22, 2025
16 checks passed
@umangyadav umangyadav deleted the rocm7_dockerfile branch September 22, 2025 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants