Skip to content

[oneDPL][rfc][ranges] proposal for implementation of the second part of range based API for oneDPL #2037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
May 4, 2025

Conversation

MikeDvorskiy
Copy link
Contributor

[oneDPL][rfc][ranges] proposal for implementation of the second part of range based API for oneDPL

akukanov
akukanov previously approved these changes Apr 8, 2025
Copy link
Contributor

@akukanov akukanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good enough to me.

@akukanov akukanov added this to the 2022.9.0 milestone Apr 8, 2025
Comment on lines +20 to +21
- Several algorithms described in P3179 have slightly different semantics. To implement these, some existing algorithm patterns
might require modifications or new versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to go into the details of each, I think we should at least list the APIs with semantic differences which may need modifications

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, it is reasonable..

danhoeflinger
danhoeflinger previously approved these changes Apr 17, 2025
Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After minor change, generally looks fine to me.

- In case of a `device_policy` the projections pointer-to-member and pointer-to-function are not supported, for SYCL backend at least.

### Test coverage
- The algorithms should be called with both small and large data sizes and with all the policies mentioned above.
Copy link
Contributor

@dmitriy-sobolev dmitriy-sobolev Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you be more specific what "small" and "large" means here? If I had not reviewed the first part, I would not have a chance to understand it.

I vaguely remember that the motivation was to cover more code specializations: "small" data sizes would cover serial/group specializations and "large" - parallel execution or using the whole device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"small" an "big" sizes should enable differentiated implementation with level of parallelism perspective.

"small" means that such data sizes are processed by differentiated implementation for "small" data sizes. Usually it bounded by a work group size (for example, < 256). Other words. A small size sequence is processed by a kind of "one group algorithm in case of a device policy. And a small size sequence is processed by a kind SIMD algorithm in kind of seq/unseq policy.

"big" sizes means that such data sizes are processed by differentiated implementation for such sizes where parallelism is, or where parallelism is maximum. Tens of millions for device policy and millions for the host policies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The algorithms should be called with both small and large data sizes and with all the policies mentioned above.
- The algorithms should be called with both small and large sizes.
Usually, the algorithms have a general implementation, and a specialized implementation for small sizes.
The definition of small and large depends on a certain algorithm and a policy it is used with.
Small can be 256 elements or less.
The magnitude of large can be 10^3 elements for `seq` and `unseq`, 10^6 for `par` and `par_unseq`, and 10^7 for `device_policy` to maximize parallelism.

Copy link
Contributor

@akukanov akukanov Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would instead say that testing should be done for a variety of the range sizes from several hundreds to several millions - but without extra details which Dmitry suggests.

For me, the main motivation of this is to make sure that the test is really executed in parallel. A typical test that operates with a dozen of elements is often sufficient to check the semantics and even the corner cases, but it does not create enough parallel slack and essentially will execute in a single thread. Ensuring that we tested the parallel execution is specifically important for us as the authors of P3179 to claim that there is reasonable implementation experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. See #2037 (comment).

Copy link
Contributor

@dmitriy-sobolev dmitriy-sobolev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ready to approve once the suggestion below has been applied.

Copy link
Contributor

@dmitriy-sobolev dmitriy-sobolev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

https://github.com/uxlfoundation/oneDPL/pull/2037/files#r2035529923 suggestion is optional in my opinion.

Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well

@MikeDvorskiy MikeDvorskiy merged commit a5f94ba into main May 4, 2025
2 checks passed
@MikeDvorskiy MikeDvorskiy deleted the dev/mdvorski/rfc_range_algo_j1363 branch May 4, 2025 18:49
MikeDvorskiy added a commit that referenced this pull request May 5, 2025
…ion (#2100)

* Based on RFC #2037
* [oneDPL][ranges][M] + fix: __unseq_backend::__simd_walk_1 -> __unseq_backend::__simd_walk_n
* [oneDPL][hetero] + changes walkX_vectors_or_scalars class: + 'mutable' is to relax the requirements for a user functor/lambda type operator() may be non-const
* [oneDPL][ranges][M] + ranges::generate removed; (there is still discussion in C++ standard committee)
timmiesmith pushed a commit that referenced this pull request Jun 9, 2025
…of range based API for oneDPL (#2037)

* [oneDPL][rfc][ranges] proposal for implementation of the second part of range based API for oneDPL

Co-authored-by: Alexey Kukanov <[email protected]>
Co-authored-by: Dan Hoeflinger <[email protected]>
Co-authored-by: Dmitriy Sobolev <[email protected]>
timmiesmith pushed a commit that referenced this pull request Jun 9, 2025
…ion (#2100)

* Based on RFC #2037
* [oneDPL][ranges][M] + fix: __unseq_backend::__simd_walk_1 -> __unseq_backend::__simd_walk_n
* [oneDPL][hetero] + changes walkX_vectors_or_scalars class: + 'mutable' is to relax the requirements for a user functor/lambda type operator() may be non-const
* [oneDPL][ranges][M] + ranges::generate removed; (there is still discussion in C++ standard committee)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants