-
Notifications
You must be signed in to change notification settings - Fork 116
[oneDPL][rfc][ranges] proposal for implementation of the second part of range based API for oneDPL #2037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…of range based API for oneDPL
Co-authored-by: Alexey Kukanov <[email protected]>
Co-authored-by: Alexey Kukanov <[email protected]>
Co-authored-by: Alexey Kukanov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good enough to me.
- Several algorithms described in P3179 have slightly different semantics. To implement these, some existing algorithm patterns | ||
might require modifications or new versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to go into the details of each, I think we should at least list the APIs with semantic differences which may need modifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably, it is reasonable..
Co-authored-by: Dan Hoeflinger <[email protected]>
Co-authored-by: Dan Hoeflinger <[email protected]>
Co-authored-by: Dan Hoeflinger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After minor change, generally looks fine to me.
- In case of a `device_policy` the projections pointer-to-member and pointer-to-function are not supported, for SYCL backend at least. | ||
|
||
### Test coverage | ||
- The algorithms should be called with both small and large data sizes and with all the policies mentioned above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you be more specific what "small" and "large" means here? If I had not reviewed the first part, I would not have a chance to understand it.
I vaguely remember that the motivation was to cover more code specializations: "small" data sizes would cover serial/group specializations and "large" - parallel execution or using the whole device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"small" an "big" sizes should enable differentiated implementation with level of parallelism perspective.
"small" means that such data sizes are processed by differentiated implementation for "small" data sizes. Usually it bounded by a work group size (for example, < 256). Other words. A small size sequence is processed by a kind of "one group algorithm in case of a device policy. And a small size sequence is processed by a kind SIMD algorithm in kind of seq/unseq policy.
"big" sizes means that such data sizes are processed by differentiated implementation for such sizes where parallelism is, or where parallelism is maximum. Tens of millions for device policy and millions for the host policies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The algorithms should be called with both small and large data sizes and with all the policies mentioned above. | |
- The algorithms should be called with both small and large sizes. | |
Usually, the algorithms have a general implementation, and a specialized implementation for small sizes. | |
The definition of small and large depends on a certain algorithm and a policy it is used with. | |
Small can be 256 elements or less. | |
The magnitude of large can be 10^3 elements for `seq` and `unseq`, 10^6 for `par` and `par_unseq`, and 10^7 for `device_policy` to maximize parallelism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would instead say that testing should be done for a variety of the range sizes from several hundreds to several millions - but without extra details which Dmitry suggests.
For me, the main motivation of this is to make sure that the test is really executed in parallel. A typical test that operates with a dozen of elements is often sufficient to check the semantics and even the corner cases, but it does not create enough parallel slack and essentially will execute in a single thread. Ensuring that we tested the parallel execution is specifically important for us as the authors of P3179 to claim that there is reasonable implementation experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. See #2037 (comment).
Co-authored-by: Dan Hoeflinger <[email protected]>
Co-authored-by: Dmitriy Sobolev <[email protected]>
Co-authored-by: Dmitriy Sobolev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ready to approve once the suggestion below has been applied.
Co-authored-by: Dmitriy Sobolev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
https://github.com/uxlfoundation/oneDPL/pull/2037/files#r2035529923 suggestion is optional in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well
…ion (#2100) * Based on RFC #2037 * [oneDPL][ranges][M] + fix: __unseq_backend::__simd_walk_1 -> __unseq_backend::__simd_walk_n * [oneDPL][hetero] + changes walkX_vectors_or_scalars class: + 'mutable' is to relax the requirements for a user functor/lambda type operator() may be non-const * [oneDPL][ranges][M] + ranges::generate removed; (there is still discussion in C++ standard committee)
…of range based API for oneDPL (#2037) * [oneDPL][rfc][ranges] proposal for implementation of the second part of range based API for oneDPL Co-authored-by: Alexey Kukanov <[email protected]> Co-authored-by: Dan Hoeflinger <[email protected]> Co-authored-by: Dmitriy Sobolev <[email protected]>
…ion (#2100) * Based on RFC #2037 * [oneDPL][ranges][M] + fix: __unseq_backend::__simd_walk_1 -> __unseq_backend::__simd_walk_n * [oneDPL][hetero] + changes walkX_vectors_or_scalars class: + 'mutable' is to relax the requirements for a user functor/lambda type operator() may be non-const * [oneDPL][ranges][M] + ranges::generate removed; (there is still discussion in C++ standard committee)
[oneDPL][rfc][ranges] proposal for implementation of the second part of range based API for oneDPL