-
Notifications
You must be signed in to change notification settings - Fork 130
Open
Description
Unfortunately neither u128
nor swap_bytes
are supported directly by WebAssembly. So both implementations of folded_multiply
are very slow.
I think an algorithm that takes both u64 values, turns them into a v128 vector and then does a bunch of swizzling and vector multiplications and co. probably would be the much faster solution. Here a Godbolt link with a little sketch:
https://rust.godbolt.org/z/jGGhYjGs8
I don't have enough knowledge about how to verify the quality, so I decided to not directly open a PR and instead first discuss the feasibility.
Metadata
Metadata
Assignees
Labels
No labels