feature: rng primitive refactoring #2968

Alexandr-Solovev · 2024-11-05T12:10:27Z

Description

Description:

Feature: RNG primitive refactoring

Summary:

This PR updates the oneDAL rng primitive. It includes various fixes and modifications for RNG primitive.

Key Changes:

New generators have been added:
- mrg32k3a and philox engines have been added in DAAL/oneDAL.
Host and DPC engines have been refactored and added:
- Opportunity to use RNG on all devices.

SKLEARNEX related pr: uxlfoundation/scikit-learn-intelex#2228

PR completeness and readability

I have reviewed my changes thoroughly before submitting this pull request.
I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have added a respective label(s) to PR if I have a permission for that.
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
I have provided justification why performance has changed or why changes are not expected.
I have provided justification why quality metrics have changed or why changes are not expected.
I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

…oring

Alexandr-Solovev · 2024-11-19T13:22:41Z

/intelci: run

cpp/daal/include/algorithms/engines/mrg32k3a/mrg32k3a.h

cpp/daal/src/algorithms/engines/mrg32k3a/mrg32k3a_batch_impl.h

cpp/daal/src/algorithms/engines/mrg32k3a/mrg32k3a_impl.i

cpp/daal/src/algorithms/engines/philox4x32x10/philox4x32x10_batch_container.h

cpp/daal/src/algorithms/engines/philox4x32x10/philox4x32x10_impl.i

cpp/oneapi/dal/backend/primitives/rng/engine_gpu.hpp

cpp/oneapi/dal/backend/primitives/rng/rng.hpp

cpp/oneapi/dal/backend/primitives/rng/rng_types.hpp

docs/source/daal/algorithms/engines/mrg32k3a.rst

…actoring

Alexandr-Solovev · 2024-12-16T11:59:05Z

/intelci: run

Alexandr-Solovev · 2024-12-16T16:38:49Z

/intelci: run

Alexandr-Solovev · 2024-12-17T11:47:40Z

/intelci: run

Alexandr-Solovev · 2024-12-17T14:50:28Z

sklearnex ref pr: uxlfoundation/scikit-learn-intelex#2228

david-cortes-intel · 2024-12-19T08:31:30Z

Couple questions from my side:

Why would the default be left as MT2203 instead of something with better properties? Why not philox for example? It appears to be substantially faster.
- All of the generators introduced here appear to produce 32-bit numbers, but some of the usages are for e.g. generating 64-bit integers, which would require two draws for one output. Wouldn't it be better to make the default dependent on the templated parameters that an algorithm will use? Might be rather complicated though so not sure how feasible it is.
Is the idea to simply include all generators from MKL, or to provide reasonably good choices for oneDAL? We already know for example that MT-family generators have issues when used like oneDAL does. How about removing those?
- Do we know if these generators have state quality issues when initializing them from a single seed instead of a seed sequence? For example, the current default generator passed from sklearnex (I think MT19937 but not sure) had the issue (pointed out by @icfaust) of producing very biased first samples when initialized like that with a low-number seed.

Alexandr-Solovev · 2024-12-19T08:49:50Z

@david-cortes-intel

Why would the default be left as MT2203 instead of something with better properties? Why not philox for example?

For RF algorithm the feature to choose engine_method will be added in the follow-up pr(likely here #3029). I hold MT2203 to save result compatibility with previous releases. Not sure that its the best strategy in terms of performance, but I expect potential accuracy/mse with philox might be worse. Moreover it can make the review process more complex.
For other algorithms the default engine method can be replaced with the new one, but I didn't want to do it in this pr, due to potential metrics degradations. Can be easily done in the next prs imho.

I probably didnt get the idea of

All of the generators introduced here appear to produce 32-bit numbers, but some of the usages are for e.g. generating 64-bit integers, which would require two draws for one output. Wouldn't it be better to make the default dependent on the templated parameters that an algorithm will use? Might be rather complicated though so not sure how feasible it is.

Do you mean create a dispatcher before, for example unifrom, or inside the unifrom, which will choose the best potential engine method?

Alexandr-Solovev · 2024-12-19T08:58:55Z

@david-cortes-intel

Is the idea to simply include all generators from MKL, or to provide reasonably good choices for oneDAL?

No, we preliminary discussed it with oneMKL team and they suggested the best one in terms of performance and generator period engines for GPU(our initial goal was improve rf performance on GPU). The full list of engines is pretty big https://oneapi-spec.uxlfoundation.org/specifications/oneapi/v1.3-rev-1/elements/onemkl/source/domains/rng/host_api/engines-basic-random-number-generators so, the goal wasnt just add all engine methods from oneMKL.

We already know for example that MT-family generators have issues when used like oneDAL does. How about removing those?

I am not against this removing, but not sure that we can easily change the default behavior. Based on the experiments new engines(mrg32 philox) + mcg59 are significantly better. It makes sense to discuss, but probably just change the default engine could be good solution.

Do we know if these generators have state quality issues when initializing them from a single seed instead of a seed sequence? For example, the current default generator passed from sklearnex (I think MT19937 but not sure) had the issue (pointed out by @icfaust) of producing very biased first samples when initialized like that with a low-number seed.

I guess it depends on implementation on oneMKL side. Not sure, but as I know, by default oneMKL uses N threads sub-engines in mt2203 and mt19937. May be its the reason of such behavior. As I know mrg32 philox and mcg59 are implemented without sub-engines inside.

david-cortes-intel · 2024-12-19T09:59:43Z

I hold MT2203 to save result compatibility with previous releases.

Note that NumPy itself, which is used by scikit-learn, does not have such compatibility guarantees when using their random.Generator module:
https://numpy.org/doc/stable/reference/random/compatibility.html

Thus, I don't think it should be a big deal to make breaking changes in produced random numbers in sklearnex.

I expect potential accuracy/mse with philox might be worse

I'd expect it should actually be the opposite, since (a) we are seeding MT with a single integer instead of a sequence, which leaves it with issues for the first draws; (b) philox appears to do better at statistical tests according to the paper that introduced it - see for example that it passed BigCrush:
https://www.thesalmons.org/john/random123/papers/random123sc11.pdf
.. whereas MT fail some of it: https://arxiv.org/pdf/1910.06437

Do you mean create a dispatcher before, for example unifrom, or inside the unifrom, which will choose the best potential engine method?

On a deeper look, it seems all generators from MKL are 32-bit only, so please ignore earlier comment.

I guess it depends on implementation on oneMKL side. Not sure, but as I know, by default oneMKL uses N threads sub-engines in mt2203 and mt19937. May be its the reason of such behavior. As I know mrg32 philox and mcg59 are implemented without sub-engines inside.

Philox is a counter-based generator, so parallelizing it and jumping states should be pretty straightforward, without needing to keep sub-engines.

david-cortes-intel · 2024-12-19T10:00:19Z

I am not against this removing, but not sure that we can easily change the default behavior.

Let's leave the change in defaults for a different PR then.

david-cortes-intel · 2024-12-19T10:03:43Z

@Alexandr-Solovev Are the issues from the CI meant to be solved with the PR from the sklearnex side?

In file included from /home/vsts/work/1/s/build/daal4py_cpp.cpp:2:
/home/vsts/work/1/s/build/daal4py_cpp.h: In instantiation of ‘engines_mrg32k3a_manager<fptype, method>::engines_mrg32k3a_manager() [with fptype = double; daal::algorithms::engines::mrg32k3a::Method method = daal::algorithms::engines::mrg32k3a::defaultDense]’:
/home/vsts/work/1/s/build/daal4py_cpp.cpp:423:94:   required from here
/home/vsts/work/1/s/build/daal4py_cpp.h:2358:22: error: ‘daal::algorithms::engines::mrg32k3a::interface1::Batch<algorithmFPType, method>::Batch(size_t) [with algorithmFPType = double; daal::algorithms::engines::mrg32k3a::Method method = daal::algorithms::engines::mrg32k3a::defaultDense; size_t = long unsigned int]’ is protected within this context
 2358 |         _algob.reset(new algob_type());
      |                      ^~~~~~~~~~~~~~~~
In file included from /home/vsts/work/1/daal/latest/include/daal.h:307,
                 from /home/vsts/work/1/s/src/daal4py.h:24,
                 from /home/vsts/work/1/s/src/daal4py_dist.h:23,
                 from /home/vsts/work/1/s/build/daal4py_cpp.h:3,
                 from /home/vsts/work/1/s/build/daal4py_cpp.cpp:2:
/home/vsts/work/1/daal/latest/include/algorithms/engines/mrg32k3a/mrg32k3a.h:152:5: note: declared protected here
  152 |     Batch(size_t seed = 777) { initialize(); }
      |     ^~~~~

Alexandr-Solovev · 2024-12-19T10:53:35Z

@david-cortes-intel Thanks for the comments!

I'd expect it should actually be the opposite, since (a) we are seeding MT with a single integer instead of a sequence, which leaves it with issues for the first draws; (b) philox appears to do better at statistical tests according to the paper that introduced it - see for example that it passed BigCrush:
https://www.thesalmons.org/john/random123/papers/random123sc11.pdf
.. whereas MT fail some of it: https://arxiv.org/pdf/1910.06437

I will be glad to change it, but based on the testing I temporary disabled one test, so, need to investigate, but overall lets do it.

Let's leave the change in defaults for a different PR then.

For sure!

Are the issues from the CI meant to be solved with the PR from the sklearnex side?

Yes, it should be fixed here uxlfoundation/scikit-learn-intelex#2228. I am waiting for combined CI results

…actoring

Alexandr-Solovev · 2024-12-19T17:21:06Z

/intelci: run

david-cortes-intel · 2024-12-20T08:11:46Z

Looks like something went wrong in the examples:

 cpp/oneapi/dal/backend/primitives/optimizers/test/newton_cg_dpc.cpp:146:27: error: expected ';' after expression
   146 |         primitives::engine eng(4014 + n_);

 cpp/oneapi/dal/backend/primitives/rng/rng.hpp:108:6: note: candidate function template not viable: requires at least 6 arguments, but 5 were provided
   108 | void uniform(sycl::queue& queue,
       |      ^       ~~~~~~~~~~~~~~~~~~~
   109 |              Size count,
       |              ~~~~~~~~~~~
   110 |              Type* dst,
       |              ~~~~~~~~~~
   111 |              dpc_engine<EngineType>& engine_,
       |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   112 |              Type a,
       |              ~~~~~~~
   113 |              Type b,
       |              ~~~~~~~
   114 |              const event_vector& deps = {});

Alexandr-Solovev · 2024-12-20T08:35:57Z

Looks like something went wrong in the examples:

 cpp/oneapi/dal/backend/primitives/optimizers/test/newton_cg_dpc.cpp:146:27: error: expected ';' after expression
   146 |         primitives::engine eng(4014 + n_);

 cpp/oneapi/dal/backend/primitives/rng/rng.hpp:108:6: note: candidate function template not viable: requires at least 6 arguments, but 5 were provided
   108 | void uniform(sycl::queue& queue,
       |      ^       ~~~~~~~~~~~~~~~~~~~
   109 |              Size count,
       |              ~~~~~~~~~~~
   110 |              Type* dst,
       |              ~~~~~~~~~~
   111 |              dpc_engine<EngineType>& engine_,
       |              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   112 |              Type a,
       |              ~~~~~~~
   113 |              Type b,
       |              ~~~~~~~
   114 |              const event_vector& deps = {});

Thanks for highlight it, will be fixed soon

Alexandr-Solovev · 2024-12-20T10:00:27Z

/intelci: run

Alexandr-Solovev · 2024-12-20T11:38:45Z

/intelci: run

Alexandr-Solovev · 2024-12-20T14:51:41Z

/intelci: run

Alexandr-Solovev added 11 commits November 5, 2024 00:31

fixes

294b2f2

minor fixes

81d7dfe

adding mrg32k3a engine

acb6e4c

fix fro mrg32k

58d98b0

add philox

67ed2f6

a lot of fixes with rng

4941b23

Merge branch 'oneapi-src:main' into dev/asolovev_rng_primitive_refact…

5ff779f

…oring

fixes

c55bcce

clang + fisher yates

806a74c

refactoring

cc85e37

fixes

852669f

Alexandr-Solovev added the dpc++ Issue/PR related to DPC++ functionality label Nov 19, 2024

Alexandr-Solovev requested review from david-cortes-intel and icfaust November 19, 2024 14:58

Vika-F reviewed Nov 27, 2024

View reviewed changes

Alexandr-Solovev added 3 commits December 5, 2024 15:22

Merge branch 'uxlfoundation:main' into dev/asolovev_rng_primitive_ref…

d26465d

…actoring

Merge branch 'uxlfoundation:main' into dev/asolovev_rng_primitive_ref…

6ffe6c5

…actoring

comments fixes

06f1885

minor fixes

72755db

Alexandr-Solovev added 2 commits December 17, 2024 02:29

minor fixes

76967f3

fixes

917fa3d

add comments and minor renaming

06d9f82

Alexandr-Solovev marked this pull request as ready for review December 18, 2024 10:05

Alexandr-Solovev requested review from Alexsandruss and samir-nasibli as code owners December 18, 2024 10:05

Alexandr-Solovev requested a review from maria-Petrova as a code owner December 18, 2024 10:05

Alexandr-Solovev mentioned this pull request Dec 19, 2024

feature: new rng engines support uxlfoundation/scikit-learn-intelex#2228

Merged

9 tasks

david-cortes-intel approved these changes Dec 19, 2024

View reviewed changes

Merge branch 'uxlfoundation:main' into dev/asolovev_rng_primitive_ref…

c738f55

…actoring

minor fix

42e0020

fix

85d9f02

minor fix

7393a26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: rng primitive refactoring #2968

feature: rng primitive refactoring #2968

Alexandr-Solovev commented Nov 5, 2024 •

edited

Loading

Alexandr-Solovev commented Nov 19, 2024

Alexandr-Solovev commented Dec 16, 2024

Alexandr-Solovev commented Dec 16, 2024

Alexandr-Solovev commented Dec 17, 2024

Alexandr-Solovev commented Dec 17, 2024

david-cortes-intel commented Dec 19, 2024

Alexandr-Solovev commented Dec 19, 2024

Alexandr-Solovev commented Dec 19, 2024

david-cortes-intel commented Dec 19, 2024

david-cortes-intel commented Dec 19, 2024

david-cortes-intel commented Dec 19, 2024

Alexandr-Solovev commented Dec 19, 2024

Alexandr-Solovev commented Dec 19, 2024

david-cortes-intel commented Dec 20, 2024 •

edited

Loading

Alexandr-Solovev commented Dec 20, 2024

Alexandr-Solovev commented Dec 20, 2024

Alexandr-Solovev commented Dec 20, 2024

Alexandr-Solovev commented Dec 20, 2024

feature: rng primitive refactoring #2968

Are you sure you want to change the base?

feature: rng primitive refactoring #2968

Conversation

Alexandr-Solovev commented Nov 5, 2024 • edited Loading

Description

Description:

Summary:

Key Changes:

SKLEARNEX related pr: uxlfoundation/scikit-learn-intelex#2228

Alexandr-Solovev commented Nov 19, 2024

Alexandr-Solovev commented Dec 16, 2024

Alexandr-Solovev commented Dec 16, 2024

Alexandr-Solovev commented Dec 17, 2024

Alexandr-Solovev commented Dec 17, 2024

david-cortes-intel commented Dec 19, 2024

Alexandr-Solovev commented Dec 19, 2024

Alexandr-Solovev commented Dec 19, 2024

david-cortes-intel commented Dec 19, 2024

david-cortes-intel commented Dec 19, 2024

david-cortes-intel commented Dec 19, 2024

Alexandr-Solovev commented Dec 19, 2024

Alexandr-Solovev commented Dec 19, 2024

david-cortes-intel commented Dec 20, 2024 • edited Loading

Alexandr-Solovev commented Dec 20, 2024

Alexandr-Solovev commented Dec 20, 2024

Alexandr-Solovev commented Dec 20, 2024

Alexandr-Solovev commented Dec 20, 2024

Alexandr-Solovev commented Nov 5, 2024 •

edited

Loading

david-cortes-intel commented Dec 20, 2024 •

edited

Loading