Skip to content

Improve baseline comparison logic #817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

vedithal-amd
Copy link
Contributor

@vedithal-amd vedithal-amd commented Jul 17, 2025

rocprofiler-compute Pull Request

Related Issue

  • Closes #

What type of PR is this? (check all that apply)

  • Bug Fix
  • Cherry Pick
  • Continuous Integration
  • Documentation Update
  • Feature
  • Optimization
  • Refactor
  • Other (please specify)

Technical Details

  • Do not force unsupported metrics to be specified in older gpu
    architectures as None

  • Remove metrics which are explicitly set to None

  • Update CHANGELOG

  • Fix analysis configuration to fix baseline comparisons across all gpu
    architectures

    • Add missing 1812 section for gfx908
    • Add missing 1812 section for gfx90a

We cannot guarantee common Metric ID index for metrics comparison across different GPU models.
The Metric ID is not very useful anyways and this is how it looks with and without Metric ID

With Metric ID

{5222FF04-74D9-45A6-B1A8-DF5923FB65EB}

Without Metric ID
{E793DB9F-457F-41A7-B6DE-8077D280AB32}

Have you added or updated tests to validate functionality?

  • Yes
  • No - does not apply to this PR
    Validated baseline comparison working across all gpu architectures

Added / Updated documentation?

  • Yes
  • No - does not apply to this PR

Have you updated CHANGELOG?

  • Yes
  • No - does not apply to this PR

Copy link
Contributor

@feizheng10 feizheng10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please separate gfx906 deprecation out of this pr.
  2. Keep block IDs is important. When we are talking about "baseline comparison", the 1st run is used as baseline reference, which implies all the info from the 1st run are the baseline including block IDs.

@feizheng10
Copy link
Contributor

And just for internal maintain/design purpose, we have meta info in the comments(temporally) for the reasons why we don't have the metrics available. It is better to provide solution when removing those.

@vedithal-amd
Copy link
Contributor Author

vedithal-amd commented Jul 17, 2025

1st run is used as baseline reference, which implies all the info from the 1st run are the baseline including block IDs.

My concern is, does it make sense to compare a metric when its only available in one run?
Lets say 1st run has Metric1 = 10, Metric2 = 10 and 2nd run does not have Metric1 but Metric2 = 5.
In above case, does it make sense to only compare Metric2 which is common?
Why will the user want to see Metric1 in baseline comparison when it is not there in run 2? Why cant the user just do analysis of run 1 to see Metric1?

Please separate gfx906 deprecation out of this pr.

OK

And just for internal maintain/design purpose, we have meta info in the comments(temporally) for the reasons why we don't have the metrics available. It is better to provide solution when removing those.

We can backport the metrics from MI350 config to older GPUs when someone ask for it in a different PR.
There is no point in having None metrics when you can look at the metric information in newer MI350 GPU config.

@vedithal-amd
Copy link
Contributor Author

vedithal-amd commented Jul 17, 2025

Please separate gfx906 deprecation out of this pr.

This will be tracked in #819

* Do not force unsupported metrics to be specified in older gpu
  architectures as None

* Remove metrics which are explicitly set to None

* Update CHANGELOG

* Fix analysis configuration to fix baseline comparisons across all gpu
  architectures
    * Add missing 1812 section for gfx908
    * Add missing 1812 section for gfx90a
@vedithal-amd
Copy link
Contributor Author

Keep block IDs is important.

There is no such thing as block IDs, I think you are referring to Metric ID and i dont think that adds any value to the analysis report. Metric ID will still be shown in non baseline comparison mode. Only in baseline comparison mode its not able to show Metric ID due to metrics being different.

@vedithal-amd vedithal-amd requested a review from feizheng10 July 17, 2025 19:56
@vedithal-amd
Copy link
Contributor Author

As discussed on call, we can add Metric ID from first workload during baseline comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants