-
Notifications
You must be signed in to change notification settings - Fork 89
Initial support for Big Endian #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi, @SammyVimes! Thank you for the PR! Big Endian support is a great thing to have, but we need to make a few changes to proceed with the PR.
|
Hi, @ashvardanian! Sure, will change the PR accordingly. I really don’t know how I managed to miss the main-dev branch 😅 |
Hi @SammyVimes! Any chance you've made any progress on the new version? I'm installing Docker QEMU images now, to test on 32-bit and big-endian architectures to generalize StringZilla further. Can continue your efforts, if you have anything you can push 🤗 |
@ashvardanian sorry, I was completely swamped by work. If big endian support is still required, I will happily pick up where I left off |
@@ -35,6 +35,8 @@ | |||
#define NULL ((void *)0) | |||
#endif | |||
|
|||
#define IS_LITTLE_ENDIAN (*(uint16_t *)"\0\xff" > 0x100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The endianness detection macro uses unaligned pointer access which is undefined behavior in C and can cause crashes on architectures that don't support unaligned memory access. String literals may be placed in read-only memory which could cause additional issues. A safer approach using unions or byte-by-byte comparison should be used.
React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)
sz_string_start_t const end = haystack + haystack_length; | ||
sz_string_start_t text = end - 1; | ||
sz_string_start_t text = end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In sz_rfind_1char_swar, initializing 'text' with 'end' and then immediately using it in a decrement operation (--text) while checking alignment can lead to undefined behavior if 'end' points to the start of the buffer, causing potential buffer underflow. The alignment check should be done after ensuring text > haystack.
React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)
😱 Found 2 issues. Time to roll up your sleeves! 😱 |
Hi! Noticed that there is no support for Big Endian, so decided to start working on it.
Currently I did:
(also fixed a little bug in the little endian version of
sz_rfind_1char_swar
).I also took a liberty of adding googletest dependency (of course I can remove it, I just prefer using it)
and supporting linux version of qsort_r in the test.cpp.
Although I am using macOS, I was able to test on a big endian machine using QEMU and docker. So I think the next thing to do for this pull request will be adding a workflow with the similar setup (or maybe GitHub has BE machines, I don't know yet).
With all those
if (IS_LITTLE_ENDIAN)
code looks funky, I know. I will try to do something with it