Skip to content

[fix] Filesystem path: hash function not robust enough #129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 13, 2025

Conversation

blefaudeux
Copy link
Contributor

lucky catch, a unit test failed on me and I had to investigate.. the hashing function was not good enough and sequential filenames could all end up in the same bucket (for 10 consecutive files, something like 10% of the time)
Fixed by using a better hashing function

Copy link
Contributor

@photoroman photoroman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Remove debug logs?

@blefaudeux
Copy link
Contributor Author

jeez, on the agents there's no HW random source probably, so ahash falls back to another path which is actually less random -> the test now fails (: https://docs.rs/ahash/latest/ahash/ having a look

@blefaudeux
Copy link
Contributor Author

woops, looks like fasthash is causing trouble to the CI.. having a look later, a bit short at the moment

@blefaudeux blefaudeux marked this pull request as ready for review June 13, 2025 12:14
@blefaudeux blefaudeux merged commit 5988ce3 into main Jun 13, 2025
8 checks passed
@blefaudeux
Copy link
Contributor Author

still not good enough, cc @photoroman https://github.com/Photoroom/datago/actions/runs/15634295340/job/44045572960
I'll get back to this, need a good number of tests before being confident it's reliable in the end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants