-
Notifications
You must be signed in to change notification settings - Fork 17
Show VCN and JPEG busy values where VCN/JPEG activity is not supported. #232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: amd-staging
Are you sure you want to change the base?
Conversation
Note: Code changes in gpu.cpp and gpu.hpp is duplicated in PR: 226. We need to rebase one on top of another based on which PR gets merged first. |
source/lib/core/gpu.cpp
Outdated
bool vcn_supported = false; | ||
bool jpeg_supported = false; | ||
bool v_busy_supported = false; | ||
bool j_busy_supported = false; | ||
ret = amdsmi_get_gpu_metrics_info(processor, &gpu_metrics); | ||
if(ret == AMDSMI_STATUS_SUCCESS) | ||
{ | ||
for(const auto& vcn_activity : gpu_metrics.vcn_activity) | ||
{ | ||
if(vcn_activity != UINT16_MAX) | ||
{ | ||
vcn_supported = true; | ||
break; | ||
} | ||
} | ||
for(const auto& jpeg_activity : gpu_metrics.jpeg_activity) | ||
{ | ||
if(jpeg_activity != UINT16_MAX) | ||
{ | ||
jpeg_supported = true; | ||
break; | ||
} | ||
} | ||
for(const auto& xcp : gpu_metrics.xcp_stats) | ||
{ | ||
for(const auto& vcn_busy : xcp.vcn_busy) | ||
{ | ||
if(vcn_busy != UINT16_MAX) | ||
{ | ||
v_busy_supported = true; | ||
break; | ||
} | ||
} | ||
for(const auto& jpeg_busy : xcp.jpeg_busy) | ||
{ | ||
if(jpeg_busy != UINT16_MAX) | ||
{ | ||
j_busy_supported = true; | ||
break; | ||
} | ||
} | ||
} | ||
} | ||
processors::vcn_activity_supported.push_back(vcn_supported); | ||
processors::jpeg_activity_supported.push_back(jpeg_supported); | ||
processors::vcn_busy_supported.push_back(v_busy_supported); | ||
processors::jpeg_busy_supported.push_back(j_busy_supported); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool vcn_supported = false; | |
bool jpeg_supported = false; | |
bool v_busy_supported = false; | |
bool j_busy_supported = false; | |
ret = amdsmi_get_gpu_metrics_info(processor, &gpu_metrics); | |
if(ret == AMDSMI_STATUS_SUCCESS) | |
{ | |
for(const auto& vcn_activity : gpu_metrics.vcn_activity) | |
{ | |
if(vcn_activity != UINT16_MAX) | |
{ | |
vcn_supported = true; | |
break; | |
} | |
} | |
for(const auto& jpeg_activity : gpu_metrics.jpeg_activity) | |
{ | |
if(jpeg_activity != UINT16_MAX) | |
{ | |
jpeg_supported = true; | |
break; | |
} | |
} | |
for(const auto& xcp : gpu_metrics.xcp_stats) | |
{ | |
for(const auto& vcn_busy : xcp.vcn_busy) | |
{ | |
if(vcn_busy != UINT16_MAX) | |
{ | |
v_busy_supported = true; | |
break; | |
} | |
} | |
for(const auto& jpeg_busy : xcp.jpeg_busy) | |
{ | |
if(jpeg_busy != UINT16_MAX) | |
{ | |
j_busy_supported = true; | |
break; | |
} | |
} | |
} | |
} | |
processors::vcn_activity_supported.push_back(vcn_supported); | |
processors::jpeg_activity_supported.push_back(jpeg_supported); | |
processors::vcn_busy_supported.push_back(v_busy_supported); | |
processors::jpeg_busy_supported.push_back(j_busy_supported); | |
bool vcn_supported = false, jpeg_supported = false; | |
bool v_busy_supported = false, j_busy_supported = false; | |
// AMD SMI will not report VCN_activity and JPEG_activity, if VCN_busy or JPEG_busy fields are available. | |
if(amdsmi_get_gpu_metrics_info(processor, &gpu_metrics) == AMDSMI_STATUS_SUCCESS) | |
{ | |
// Helper lambda to check if any value in the array is valid | |
auto has_valid = [](const auto& arr) { | |
return std::any_of(std::begin(arr), std::end(arr), | |
[](auto val) { return val != UINT16_MAX; }); | |
}; | |
vcn_supported = has_valid(gpu_metrics.vcn_activity); | |
jpeg_supported = has_valid(gpu_metrics.jpeg_activity); | |
// Check if VCN and JPEG busy metrics are available | |
for(const auto& xcp : gpu_metrics.xcp_stats) | |
{ | |
if(!v_busy_supported && has_valid(xcp.vcn_busy)) v_busy_supported = true; | |
if(!j_busy_supported && has_valid(xcp.jpeg_busy)) j_busy_supported = true; | |
if(v_busy_supported && j_busy_supported) break; | |
} | |
} |
Just an idea for a refactor. Totally optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add this. Its more optimal and clear.
On AMD-SMI in rocm 7.0 vcn_activity and jpeg_activity will not be reported where XCP(partition) stats vcn_busy and jpeg_busy are available. This causes the activity tracking to fail. The fix is to read the busy values when activity values are not supported.
0db6283
to
917d384
Compare
Signed-off-by: David Galiffi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, vcn_activity
and jpeg_activity
both come from the amdsmi_get_gpu_metrics_info
. Therefore, _gpu_metrics
will have the data for both metrics, whether the setting is enabled or not. Why not simplify the branching logic here (while sampling) and save both into m_xcp_metrics
. Leave the logic to show the data (or not) until the post-processing, since we have to do it there anyway.
Optionally, if you want to add to the scope of your change, we can change our code to call amdsmi just once for the gpu_metrics_info.
ROCPROFSYS_AMDSMI_GET(get_settings(m_dev_id).vcn_activity || get_settings(m_dev_id).jpeg_activity,
amdsmi_get_gpu_metrics_info, sample_handle, &_gpu_metrics);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good approach. I will make the modifications.
* Fix the AMD-SMI double initialization * Optimize VCN/JPEG sampling to store both values always
On AMD-SMI, in rocm 7.0,
vcn_activity
andjpeg_activity
will not be reported where XCP (partition) statsvcn_busy
andjpeg_busy
are available.This causes the activity tracking to fail. The fix is to read the busy values when activity values are not supported.
Link to AMD-SMI changelog: here
For issue: SWDEV-536439