Initial support for Big Endian #75

SammyVimes · 2024-01-29T22:10:36Z

Hi! Noticed that there is no support for Big Endian, so decided to start working on it.
Currently I did:

(also fixed a little bug in the little endian version of sz_rfind_1char_swar).

I also took a liberty of adding googletest dependency (of course I can remove it, I just prefer using it)
and supporting linux version of qsort_r in the test.cpp.

Although I am using macOS, I was able to test on a big endian machine using QEMU and docker. So I think the next thing to do for this pull request will be adding a workflow with the similar setup (or maybe GitHub has BE machines, I don't know yet).

With all those if (IS_LITTLE_ENDIAN) code looks funky, I know. I will try to do something with it

ashvardanian · 2024-01-29T22:50:51Z

Hi, @SammyVimes! Thank you for the PR! Big Endian support is a great thing to have, but we need to make a few changes to proceed with the PR.

The main-dev branch is miles ahead, and is being prepared for a major release. Please use it as a reference branch.
It is probably wiser to use macros for big/little endian checks.
Let's avoid Google Test and other third-party utilities. They are very useful in the general case, but a careful use of assert-s makes more sense here, allowing us to conditionally log more info about the scope, hence simplifying debugging.

SammyVimes · 2024-01-29T23:05:10Z

Hi, @ashvardanian! Sure, will change the PR accordingly. I really don’t know how I managed to miss the main-dev branch 😅

ashvardanian · 2024-02-11T23:11:36Z

Hi @SammyVimes! Any chance you've made any progress on the new version? I'm installing Docker QEMU images now, to test on 32-bit and big-endian architectures to generalize StringZilla further. Can continue your efforts, if you have anything you can push 🤗

SammyVimes · 2024-07-04T08:50:23Z

@ashvardanian sorry, I was completely swamped by work. If big endian support is still required, I will happily pick up where I left off

recurseml · 2025-05-02T15:34:49Z

stringzilla/stringzilla.h

@@ -35,6 +35,8 @@
 #define NULL ((void *)0)
 #endif

+#define IS_LITTLE_ENDIAN (*(uint16_t *)"\0\xff" > 0x100)


The endianness detection macro uses unaligned pointer access which is undefined behavior in C and can cause crashes on architectures that don't support unaligned memory access. String literals may be placed in read-only memory which could cause additional issues. A safer approach using unions or byte-by-byte comparison should be used.

📚 Relevant Docs

React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)

recurseml · 2025-05-02T15:34:51Z

stringzilla/stringzilla.h

    sz_string_start_t const end = haystack + haystack_length;
-    sz_string_start_t text = end - 1;
+    sz_string_start_t text = end;


In sz_rfind_1char_swar, initializing 'text' with 'end' and then immediately using it in a decrement operation (--text) while checking alignment can lead to undefined behavior if 'end' points to the start of the buffer, causing potential buffer underflow. The alignment check should be done after ensuring text > haystack.

React with 👍 to tell me that this comment was useful, or 👎 if not (and I'll stop posting more comments like this in the future)

recurseml · 2025-05-02T15:34:53Z

😱 Found 2 issues. Time to roll up your sleeves! 😱

Initial support for Big Endian

a3a1a73

recurseml bot reviewed May 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for Big Endian #75

Initial support for Big Endian #75

SammyVimes commented Jan 29, 2024 •

edited

Loading

ashvardanian commented Jan 29, 2024

SammyVimes commented Jan 29, 2024

ashvardanian commented Feb 11, 2024

SammyVimes commented Jul 4, 2024

recurseml bot May 2, 2025

recurseml bot May 2, 2025

recurseml bot commented May 2, 2025

Initial support for Big Endian #75

Are you sure you want to change the base?

Initial support for Big Endian #75

Conversation

SammyVimes commented Jan 29, 2024 • edited Loading

ashvardanian commented Jan 29, 2024

SammyVimes commented Jan 29, 2024

ashvardanian commented Feb 11, 2024

SammyVimes commented Jul 4, 2024

recurseml bot May 2, 2025

Choose a reason for hiding this comment

recurseml bot May 2, 2025

Choose a reason for hiding this comment

recurseml bot commented May 2, 2025

SammyVimes commented Jan 29, 2024 •

edited

Loading