Skip to content

[SWDEV-535603]Fix: Add rocjpeg_api to supported ROCPROFSYS_ROCM_DOMAINS list #240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

habajpai-amd
Copy link
Contributor

This change prevents runtime errors caused by the environment variable
ROCPROFSYS_ROCM_DOMAINS including 'rocjpeg_api' when the domain is not
listed in the allowed choices.

  • Explicitly adds 'rocjpeg_api' to valid domain choices
  • Enables rocjpeg callback tracing for ROCProfiler SDK >= 7.0
  • Fixes crash due to unsupported domain value during get_callback_domains()

Tested with local ROCJPEG samples and rocprof-sys-sample execution.

@habajpai-amd habajpai-amd requested review from jrmadsen and a team as code owners June 10, 2025 16:51
@dgaliffiAMD
Copy link
Collaborator

Thanks @habajpai-amd,
However, I do not think that this is a good idea.
If rocjpeg_api tracing is not supported, then we should not "fake it". This will lead to a more serious issue: the user will think that they have enabled rocjpeg_api tracing, then look at their trace file to only to find that the APIs are not there and then have to debug why they are not there (hopefully by using rocprof-sys-avail). However, if we "fail-fast", then there is immediate feedback to the user that rocjpeg_api is not supported with the paired version of rocprofiler-sdk.

@habajpai-amd
Copy link
Contributor Author

Thanks @habajpai-amd, However, I do not think that this is a good idea. If rocjpeg_api tracing is not supported, then we should not "fake it". This will lead to a more serious issue: the user will think that they have enabled rocjpeg_api tracing, then look at their trace file to only to find that the APIs are not there and then have to debug why they are not there (hopefully by using rocprof-sys-avail). However, if we "fail-fast", then there is immediate feedback to the user that rocjpeg_api is not supported with the paired version of rocprofiler-sdk.

Thank you for the valid feedback David -- I agree that we should avoid allowing unsupported domains in order to maintain fail-fast behavior.
As per your suggestion, instead of always adding rocjpeg_api to the supported domain list, I propose we add it conditionally based on the ROCProfiler SDK version.

The change would look like this inside config_settings():

_add_domain("hip_api");
_add_domain("hsa_api");
_add_domain("marker_api");

// Conditionally add rocjpeg_api only if SDK version >= 7.0
if (get_version().formatted >= 700)
_add_domain("rocjpeg_api");

This way:
Users running with SDK < 7.0 will still fail-fast if they attempt to enable rocjpeg_api.
Users running with SDK >= 7.0 will have this domain available for tracing.

@dgaliffiAMD
Copy link
Collaborator

Thanks @habajpai-amd, However, I do not think that this is a good idea. If rocjpeg_api tracing is not supported, then we should not "fake it". This will lead to a more serious issue: the user will think that they have enabled rocjpeg_api tracing, then look at their trace file to only to find that the APIs are not there and then have to debug why they are not there (hopefully by using rocprof-sys-avail). However, if we "fail-fast", then there is immediate feedback to the user that rocjpeg_api is not supported with the paired version of rocprofiler-sdk.

Thank you for the valid feedback, David -- I agree that we should avoid allowing unsupported domains in order to maintain fail-fast behavior. As per your suggestion, instead of always adding rocjpeg_api to the supported domain list, I propose we add it conditionally based on the ROCProfiler SDK version.

The change would look like this inside config_settings():

_add_domain("hip_api"); _add_domain("hsa_api"); _add_domain("marker_api");

// Conditionally add rocjpeg_api only if SDK version >= 7.0 if (get_version().formatted >= 700) _add_domain("rocjpeg_api");

This way: Users running with SDK < 7.0 will still fail-fast if they attempt to enable rocjpeg_api. Users running with SDK >= 7.0 will have this domain available for tracing.

In that case, we don't need to do anything special. In the lines just below, we'll add the domains reported by the SDK. buffered_tracing_info and callback_tracing_info are the result of SDK queries. So, for rocprofiler-sdk versions >= 0.7.0, rocjpeg_api will be included in this list already.

    for(const auto& itr : buffered_tracing_info)
        _add_domain(itr.name);
    for(const auto& itr : callback_tracing_info)
        _add_domain(itr.name);

@habajpai-amd
Copy link
Contributor Author

Thanks @habajpai-amd, However, I do not think that this is a good idea. If rocjpeg_api tracing is not supported, then we should not "fake it". This will lead to a more serious issue: the user will think that they have enabled rocjpeg_api tracing, then look at their trace file to only to find that the APIs are not there and then have to debug why they are not there (hopefully by using rocprof-sys-avail). However, if we "fail-fast", then there is immediate feedback to the user that rocjpeg_api is not supported with the paired version of rocprofiler-sdk.

Thank you for the valid feedback, David -- I agree that we should avoid allowing unsupported domains in order to maintain fail-fast behavior. As per your suggestion, instead of always adding rocjpeg_api to the supported domain list, I propose we add it conditionally based on the ROCProfiler SDK version.
The change would look like this inside config_settings():
_add_domain("hip_api"); _add_domain("hsa_api"); _add_domain("marker_api");
// Conditionally add rocjpeg_api only if SDK version >= 7.0 if (get_version().formatted >= 700) _add_domain("rocjpeg_api");
This way: Users running with SDK < 7.0 will still fail-fast if they attempt to enable rocjpeg_api. Users running with SDK >= 7.0 will have this domain available for tracing.

In that case, we don't need to do anything special. In the lines just below, we'll add the domains reported by the SDK. buffered_tracing_info and callback_tracing_info are the result of SDK queries. So, for rocprofiler-sdk versions >= 0.7.0, rocjpeg_api will be included in this list already.

    for(const auto& itr : buffered_tracing_info)
        _add_domain(itr.name);
    for(const auto& itr : callback_tracing_info)
        _add_domain(itr.name);

Thanks for clarifying David -- makes sense now. Since the SDK version detection is already implicit via callback_tracing_info, and the domains are auto-populated from the SDK response, we don’t need to explicitly add rocjpeg_api.

The fail-fast behavior remains intact, and the domain registration remains version-aware.

I'll drop this change accordingly.

@habajpai-amd
Copy link
Contributor Author

Closing this PR as the change is no longer required.

Domain registration for rocjpeg_api is handled automatically based on the SDK version through callback_tracing_info. For SDK >= 0.7.0, rocjpeg_api is already included via the SDK queries. Fail-fast behavior remains intact for unsupported domains.

Thanks for the review and helpful clarification!

@habajpai-amd habajpai-amd deleted the fix/rocjpeg-api-domain-unsupported branch June 11, 2025 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants