use a dynamic bytes swizzle for swizzle_to_front#331
Merged
Kerollmops merged 2 commits intoRoaringBitmap:mainfrom Jul 16, 2025
Merged
use a dynamic bytes swizzle for swizzle_to_front#331Kerollmops merged 2 commits intoRoaringBitmap:mainfrom
swizzle_to_front#331Kerollmops merged 2 commits intoRoaringBitmap:mainfrom
Conversation
swizzle_to_frontswizzle_to_front
Member
|
Hey @Dr-Emann 👋 Thank you for your work. Very good explanation of the problem, as always. I am wondering if the unsafe code you are showing is not the way to go? If the code is well documented we should go this way even if it's unsafe. What do you think? 🤔 |
31de4ec to
835bb11
Compare
835bb11 to
186f3fd
Compare
Member
Author
|
Sure. I added a version which also uses runtime detection if the std feature is enabled as well, so even if the user doesn't target a cpu with ssse3 directly, they still won't have to go through the non-simd version unless they run on an ancient processor. |
Kerollmops
approved these changes
Jul 16, 2025
Member
Kerollmops
left a comment
There was a problem hiding this comment.
Amazing work! Thank you for this 👍
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #207
As the comments say, unfortunately, this leads to pretty suboptimal code unless the std library is compiled with the required target feature, and the default x86_64 target (except for macos) is not compiled with the required target feature.
See godbolt showing assembly for: default x86_64 (bad assembly), aarch64 (good assembly), and x86_64 macos (good assembly, which should also be the assembly for
-Zbuild-std)We could use a little unsafe and do something like:
to allow optimal code generation with just a new enough target in the caller without requiring recompiling std.