-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small array multiplication is slow #201
Comments
I think there are two differences with eigen:
It could be interesting to make a 2D dot function that works faster on small matrices / matrix-vector combination. We could even use Eigen as a "backend" for that :) |
This is an interesting idea. First, two thoughts:
In case both options are a no go, yes I would love to see |
As I mentioned before, the dot product is more general than Eigen's matrix / matrix-vector multiplication as it performs broadcasting. Without broadcasting checks you'd probably already be 2x faster for small matrices. You could try using |
MKL won't help much I am afraid :) |
calling into BLAS functions has some intrinsic overhead (since even a function call vs. inlined cost has some cost attached to it). |
Using xtensor class does not help either. As you say, the call into BLAS and broadcast checks is probably dominating the time with small matrices. Do you have some example code on how I can use xsimd to multiply a small matrix with a vector please? |
Actually the problem is in other kinds of operations as well. Even simple array ops such as element wise array sums for xtensor based code is very slow compared to eigen when it comes to small arrays. Can we please look what bounds / broadcast checks are the culprit. Benchmark results:
|
comparing |
I'll update benchmark results using |
|
you can shave off some more ns by doing There might be some similar trick for eigen. |
Benchmark results:
Benchmark code:
Any thoughts why small arrays for xtensor incur so much overhead? what should I do if most of my arrays are small (but not fixed size).
The text was updated successfully, but these errors were encountered: