-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize optimized_portable_ops versions of portable ops? #9241
Comments
Sketch of more detailed plan:
|
This is trickier than I remembered currently. We outline our loads in elementwise_util.h in the name of build time and code size; getting to a point where vectorization would be the next item on the list would take some time. Accordingly, I am going to skip elementwise_util.h for now and move on to unary_ufunc; will come back to elementwise afterwards. |
This isn't right. The unary_func_* ops in pattern.h seem to be somewhat redundant with elementwise_util.h. |
Plan above updated to reflect resolution of my confusion about whether to start with elementwise_util or unary_ufunc_* . (In short, unary_ufunc_* will call through to elementwise_util.) |
Do you have an example for this, for say add refactor that you described in the summary of the RFC |
Do you have an example of this |
Can we not have implementation of Vectorized that is just scalar? This is already the case in pytorch core and et, no? |
|
I don't particularly want to sign up to ensure that we generate code for that that is just as good as writing scalar code directly. |
I'm running into trouble with my intended implementation here. Apparently, SFINAE can't be used together with generic lambdas to detect whether they will actually compile when passed an argument of a particular type. See #9432 ; I expect to resolve this tomorrow after sleeping on the problem. |
Updated #9432 with documentation about the SFINAE + generic lambda issue. I suspect I only ran into this because of the way I sequenced my stack; if I move the following steps in my plan before vectorizing elementwise_util, I expect better results:
|
ok I get that but i dont get if there will be other functions that will call into this lambda using |
in particular it's important to note that if you wanted at::vec::Vectorized to work nicely in this mode, you would presumably have a default Vectorized::size() of 1 so that |
Yeah thats a bit unfortunate. ok so separately, it can still be called portable fallback though? I didnt look through the header to ensure it doesnt make platform specific assumptions but I would have guessed it could be considered portable as in it can compile for platform that has c/cpp compiler with > c++17 support available. |
Just curious, did you consider copying a portable op into an optimized op and SIMD-ifying it there? The distinction will be instead of vector w/ scalar fall back in the portable_op.cpp file it will be in optimized.yml with portable.yml fallback during the selective build. |
that would be worse because we would then have copy/pasted code. |
🚀 The feature, motivation and pitch
Similarly to #8932, we should be able to conditionally compile portable ops to do some vectorization. I imagine that this would look like either passing a second lambda to our util functions, or perhaps passing template lambdas that we then could use for both some scalar T and also
Vectorized<T>
. The second option would require us to get an std-workalike interface to Vectorized operations so that things likeexp
would work seemlessly, which probably would have a similar solution to pytorch/pytorch#144495 .RFC
As a concrete example, op_add currently calls a util workhorse function with a lambda:
We could imagine instead making the call look like this, with a template lambda, so that we could seamlessly use the lambda with Vectorized:
A second, harder example is
op_exp
:I think ideally we would find a solution to the above-mentioned PyTorch issue and then write this as
using a template lambda that could be instantiated with either a scalar or Vectorized, as outlined above.
cc @larryliu0820 @manuelcandales
The text was updated successfully, but these errors were encountered: