Skip to content

Incorrect Kernel name generation for __parallel_reduce_then_scan_reduce_submitter and for __parallel_reduce_then_scan_scan_submitter #2140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

SergeyKopienko
Copy link
Contributor

@SergeyKopienko SergeyKopienko commented Mar 24, 2025

In this PR we fixing the issue #2139

Previously we had next Kernel names generation for these structures:

    using _CustomName = oneapi::dpl::__internal::__policy_kernel_name<_ExecutionPolicy>;
    using _ReduceKernel = oneapi::dpl::__par_backend_hetero::__internal::__kernel_name_provider<
        __reduce_then_scan_reduce_kernel<_CustomName>>;
    using _ScanKernel = oneapi::dpl::__par_backend_hetero::__internal::__kernel_name_provider<
        __reduce_then_scan_scan_kernel<_CustomName>>;

It's not enough to describe all possible instantiations of struct __parallel_reduce_then_scan_reduce_submitter, struct __parallel_reduce_then_scan_scan_submitter and their operator() instantiations.

…n_scan.h - remove internal _ReduceKernel type
…n_scan.h - fix kernel names for __parallel_reduce_then_scan_reduce_submitter
…n_scan.h - fix kernel names for __parallel_reduce_then_scan_scan_submitter
…n_scan.h - define kernel name through __kernel_name_generator in __parallel_reduce_then_scan_reduce_submitter::operator()
…n_scan.h - define kernel name through __kernel_name_generator in __parallel_reduce_then_scan_scan_submitter::operator()
@SergeyKopienko
Copy link
Contributor Author

@mmichel11, should we also include this fix into already released version of oneDPL ?

@SergeyKopienko SergeyKopienko marked this pull request as ready for review March 24, 2025 10:22
@SergeyKopienko SergeyKopienko requested a review from rarutyun March 24, 2025 10:30
@SergeyKopienko
Copy link
Contributor Author

Probably this issue has been introduced in the PR #2046

…n_scan.h - remove the definition of _CustomName in the __parallel_transform_reduce_then_scan as not required anymore
@danhoeflinger
Copy link
Contributor

danhoeflinger commented Mar 24, 2025

Why do we need kernel name generator for this case (rather than kernel name provider)?

If someone passes in a policy with the same external kernel name in to two different oneDPL API invocations then its their error not ours, right?

@SergeyKopienko
Copy link
Contributor Author

@danhoeflinger Please just find in the code how _ReduceKernel and _ScanKerne later used. They passed into parallel_for as Kernel names without any changes.

@danhoeflinger
Copy link
Contributor

@danhoeflinger Please just find in the code how _ReduceKernel and _ScanKerne later used. They passed into parallel_for as Kernel names without any changes.

OK, so why is that a problem?
We separate the reduce and scan kernels using different wrappers on the user provided custom name.

Lets go through the options:

  1. A user provides a unique kernel name, and unnamed lambdas are permitted: Works
  2. A user provides a unique kernel name, and unnamed lambdas are NOT permitted: Works
  3. A user uses default policy name, and unnamed lambdas are permitted: Works, because __kernel_name_provider uses __optional_kernel_name<>, and then the submitters partially specialize to get _KernelName... pack, and provide that to the kernel, where it is an empty pack, so it uses an unnamed lambda
  4. A user uses the default policy name and unnamed lambda are NOT permitted: Fails, (correctly) because it is the user's responsibility to provide a unique kernel name to the policy for each call when using no-unamed-lambdas.
  5. A user uses the same custom named policy for two oneDPL API calls: Fails, (correctly) because if custom names are provided they must be unique for each call.

Are there any other options, or any options which you disagree with the expected outcome?

@danhoeflinger
Copy link
Contributor

To clarify an often missed but important thing:
__kernel_name_generator is specifically for kernels for which we need to use a kernel bundle to first compile the kernel, then query something about the result of the kernel compilation, like to get the exact workgroup size or something to size buffers, etc.

I think its used in the old scan, radix sort, reduce and by-segment algs.

@SergeyKopienko
Copy link
Contributor Author

It's not a bug: it's by design.

@SergeyKopienko SergeyKopienko deleted the dev/skopienko/fix_kernel_names_in_reduce_then_scan branch March 24, 2025 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect Kernel name generation for __parallel_reduce_then_scan_reduce_submitter and for __parallel_reduce_then_scan_scan_submitter
2 participants