Skip to content

Show VCN and JPEG busy values where VCN/JPEG activity is not supported. #232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: amd-staging
Choose a base branch
from

Conversation

sputhala-amd
Copy link
Collaborator

@sputhala-amd sputhala-amd commented Jun 6, 2025

On AMD-SMI, in rocm 7.0, vcn_activity and jpeg_activity will not be reported where XCP (partition) stats vcn_busy and jpeg_busy are available.

This causes the activity tracking to fail. The fix is to read the busy values when activity values are not supported.

Link to AMD-SMI changelog: here

For issue: SWDEV-536439

@sputhala-amd sputhala-amd requested review from jrmadsen and a team as code owners June 6, 2025 12:08
@sputhala-amd
Copy link
Collaborator Author

Note: Code changes in gpu.cpp and gpu.hpp is duplicated in PR: 226. We need to rebase one on top of another based on which PR gets merged first.

Comment on lines 254 to 299
bool vcn_supported = false;
bool jpeg_supported = false;
bool v_busy_supported = false;
bool j_busy_supported = false;
ret = amdsmi_get_gpu_metrics_info(processor, &gpu_metrics);
if(ret == AMDSMI_STATUS_SUCCESS)
{
for(const auto& vcn_activity : gpu_metrics.vcn_activity)
{
if(vcn_activity != UINT16_MAX)
{
vcn_supported = true;
break;
}
}
for(const auto& jpeg_activity : gpu_metrics.jpeg_activity)
{
if(jpeg_activity != UINT16_MAX)
{
jpeg_supported = true;
break;
}
}
for(const auto& xcp : gpu_metrics.xcp_stats)
{
for(const auto& vcn_busy : xcp.vcn_busy)
{
if(vcn_busy != UINT16_MAX)
{
v_busy_supported = true;
break;
}
}
for(const auto& jpeg_busy : xcp.jpeg_busy)
{
if(jpeg_busy != UINT16_MAX)
{
j_busy_supported = true;
break;
}
}
}
}
processors::vcn_activity_supported.push_back(vcn_supported);
processors::jpeg_activity_supported.push_back(jpeg_supported);
processors::vcn_busy_supported.push_back(v_busy_supported);
processors::jpeg_busy_supported.push_back(j_busy_supported);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool vcn_supported = false;
bool jpeg_supported = false;
bool v_busy_supported = false;
bool j_busy_supported = false;
ret = amdsmi_get_gpu_metrics_info(processor, &gpu_metrics);
if(ret == AMDSMI_STATUS_SUCCESS)
{
for(const auto& vcn_activity : gpu_metrics.vcn_activity)
{
if(vcn_activity != UINT16_MAX)
{
vcn_supported = true;
break;
}
}
for(const auto& jpeg_activity : gpu_metrics.jpeg_activity)
{
if(jpeg_activity != UINT16_MAX)
{
jpeg_supported = true;
break;
}
}
for(const auto& xcp : gpu_metrics.xcp_stats)
{
for(const auto& vcn_busy : xcp.vcn_busy)
{
if(vcn_busy != UINT16_MAX)
{
v_busy_supported = true;
break;
}
}
for(const auto& jpeg_busy : xcp.jpeg_busy)
{
if(jpeg_busy != UINT16_MAX)
{
j_busy_supported = true;
break;
}
}
}
}
processors::vcn_activity_supported.push_back(vcn_supported);
processors::jpeg_activity_supported.push_back(jpeg_supported);
processors::vcn_busy_supported.push_back(v_busy_supported);
processors::jpeg_busy_supported.push_back(j_busy_supported);
bool vcn_supported = false, jpeg_supported = false;
bool v_busy_supported = false, j_busy_supported = false;
// AMD SMI will not report VCN_activity and JPEG_activity, if VCN_busy or JPEG_busy fields are available.
if(amdsmi_get_gpu_metrics_info(processor, &gpu_metrics) == AMDSMI_STATUS_SUCCESS)
{
// Helper lambda to check if any value in the array is valid
auto has_valid = [](const auto& arr) {
return std::any_of(std::begin(arr), std::end(arr),
[](auto val) { return val != UINT16_MAX; });
};
vcn_supported = has_valid(gpu_metrics.vcn_activity);
jpeg_supported = has_valid(gpu_metrics.jpeg_activity);
// Check if VCN and JPEG busy metrics are available
for(const auto& xcp : gpu_metrics.xcp_stats)
{
if(!v_busy_supported && has_valid(xcp.vcn_busy)) v_busy_supported = true;
if(!j_busy_supported && has_valid(xcp.jpeg_busy)) j_busy_supported = true;
if(v_busy_supported && j_busy_supported) break;
}
}

Just an idea for a refactor. Totally optional.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this. Its more optimal and clear.

On AMD-SMI in rocm 7.0 vcn_activity and jpeg_activity will not be reported
where XCP(partition) stats vcn_busy and jpeg_busy are available.

This causes the activity tracking to fail. The fix is to read the busy values
when activity values are not supported.
@sputhala-amd sputhala-amd force-pushed the xcpValues branch 2 times, most recently from 0db6283 to 917d384 Compare June 10, 2025 12:29
Signed-off-by: David Galiffi <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, vcn_activity and jpeg_activity both come from the amdsmi_get_gpu_metrics_info. Therefore, _gpu_metrics will have the data for both metrics, whether the setting is enabled or not. Why not simplify the branching logic here (while sampling) and save both into m_xcp_metrics. Leave the logic to show the data (or not) until the post-processing, since we have to do it there anyway.

Optionally, if you want to add to the scope of your change, we can change our code to call amdsmi just once for the gpu_metrics_info.

    ROCPROFSYS_AMDSMI_GET(get_settings(m_dev_id).vcn_activity || get_settings(m_dev_id).jpeg_activity,
                          amdsmi_get_gpu_metrics_info, sample_handle, &_gpu_metrics);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good approach. I will make the modifications.

* Fix the AMD-SMI double initialization
* Optimize VCN/JPEG sampling to store both values always
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants