use xxhash instead of jenkins one-at-a-time hash, if available #2365

irfanHaslanded · 2025-03-04T14:31:08Z

In several benchmarks, xxhash scores far better than the jenkins one-at-a-time hash function.

See xxhash.com for a comparison.

In several benchmarks, xxhash scores far better than the jenkins one at a time hash function. See https://xxhash.com/ for a comparison.

irfanHaslanded · 2025-03-04T14:58:08Z

For exporting some data, and checking with perf report
with these changes

  Children      Self  Command     Shared Object         Symbol
+    1.03%     0.06%  sysrepocfg  libyang.so.3.8.0  [.] lyht_hash
     0.50%     0.20%  sysrepocfg  libyang.so.3.8.0  [.] lyht_hash_multi

without these changes

  Children      Self  Command     Shared Object         Symbol
+   13.09%    13.09%  sysrepocfg  libyang.so.3.8.0  [.] lyht_hash_multi
+   11.39%     0.05%  sysrepocfg  libyang.so.3.8.0  [.] lyht_hash

michalvasko · 2025-03-05T09:12:50Z

Yes, for context creation I also measured about 12% improvement, which is nice. However, I also measured the other use-case, parsing data, where the improvement was 1%. With these numbers I am not sure what to do, we are working on a major context handling optimization so it will be even less relevant. Why did you start looking into this even, do you have a specific use-case where the hashing was taking too long?

irfanHaslanded · 2025-03-05T10:25:20Z

Why did you start looking into this even, do you have a specific use-case where the hashing was taking too long?

Yes, the perf report data I gave above, is from a real device use-case, where hashing was taking 12% of the time. I think this depends on the type of data sampled then? If the data values are long strings, maybe the current hashing becomes a time consumer.

michalvasko · 2025-03-05T11:09:57Z

What do you think about making the dependency optional? There are minimal code changes so it should not cause any mess and would avoid a new required dependency.

irfanHaslanded · 2025-03-05T12:28:24Z

What do you think about making the dependency optional? There are minimal code changes so it should not cause any mess and would avoid a new required dependency.

Sounds good.

michalvasko

Seems fine, I will do some minor refactoring changes myself, will be much faster, thanks.

bradh352 · 2025-03-07T10:56:07Z

Just a comment on this, xxhash is usually better for long strings like checksums, not necessarily hash tables where it may be slower because of the higher initial overhead. I'd think most hashtable hashes are going to be less than 32 characters so a simpler hash usually wins.

I always gravitate towards FNV1a because its super fast, small, and has good distribution. I use it in c-ares here:
https://github.com/c-ares/c-ares/blob/main/src/lib/dsa/ares_htable.c#L418-L431

(note the seed to that implementation can be 0 if you're not worried about hash collision attacks)

I'm going to guess that the speed difference is likely due to distribution/collisions between the algorithms and not the actual hash performance itself.

I hate to see a dependency pulled in for something so minor. A good overview with visual representation and speed of some hashes is here:
https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

smhasher also has some numbers
https://github.com/rurban/smhasher

Whatever tests that you ran, any chance you could re-test with FNV1a as an alternative?

michalvasko · 2025-03-10T08:29:08Z

I hate to see a dependency pulled in for something so minor.

That is why it is optional.

But it would be great if @irfanHaslanded could try the FNV1 hash and see if it fixes the performance problems. If so, we can just use that.

irfanHaslanded · 2025-03-10T09:25:23Z

Sure, I will try it out as well.

irfanHaslanded · 2025-03-10T09:38:26Z

Seems quite slow, the same as one-at-a-time hash for data

+    9.50%     9.50%  sysrepocfg  libyang.so.3.8.2  [.] lyht_hash_multi
+    8.44%     0.00%  sysrepocfg  libyang.so.3.8.2  [.] lyht_hash

Compiled same as before with -O2 gcc flags

LIBYANG_API_DEF uint32_t
lyht_hash_multi(uint32_t hash, const char *key_part, size_t len)
{
    unsigned int hv = hash ^ 2166136261U;
    size_t i;

    if (key_part) {
        for (i = 0; i < len; i++) {
            hv ^= (unsigned int)key_part[i];
            /* hv *= 16777619 (0x01000193) */
            hv += (hv << 1) + (hv << 4) + (hv << 7) + (hv << 8) + (hv << 24);
        }
    }

    return hv;
}

LIBYANG_API_DEF uint32_t
lyht_hash(const char *key, size_t len)
{
    uint32_t hash;

    hash = lyht_hash_multi(0, key, len);
    return lyht_hash_multi(hash, NULL, len);
}

bradh352 · 2025-03-10T10:05:24Z

Seems quite slow, the same as one-at-a-time hash for data

+    9.50%     9.50%  sysrepocfg  libyang.so.3.8.2  [.] lyht_hash_multi
+    8.44%     0.00%  sysrepocfg  libyang.so.3.8.2  [.] lyht_hash

Compiled same as before with -O2 gcc flags

LIBYANG_API_DEF uint32_t
lyht_hash_multi(uint32_t hash, const char *key_part, size_t len)
{
    unsigned int hv = hash ^ 2166136261U;
    size_t i;

    if (key_part) {
        for (i = 0; i < len; i++) {
            hv ^= (unsigned int)key_part[i];
            /* hv *= 16777619 (0x01000193) */
            hv += (hv << 1) + (hv << 4) + (hv << 7) + (hv << 8) + (hv << 24);
        }
    }

    return hv;
}

LIBYANG_API_DEF uint32_t
lyht_hash(const char *key, size_t len)
{
    uint32_t hash;

    hash = lyht_hash_multi(0, key, len);
    return lyht_hash_multi(hash, NULL, len);
}

lyht_hash() should just be return lyht_hash_multi(0, key, len), don't call lyht_hash_multi() twice. It likely messes up distribution.

Also, what is the context of the benchmark you're running? Meaning what else is being called? Is it saying the hash itself is taking up that much time from the entire test? How did the total execution time compare as well? Depending on the hash the overall run time may be better if there are fewer collisions, and so on...

irfanHaslanded · 2025-03-10T11:27:48Z

lyht_hash() should just be return lyht_hash_multi(0, key, len), don't call lyht_hash_multi() twice. It likely messes up distribution.

The same test with updated lyht_hash(), did not make much of a difference either way.

LIBYANG_API_DEF uint32_t
lyht_hash(const char *key, size_t len)
{
    return lyht_hash_multi(0, key, len);
}

Is it saying the hash itself is taking up that much time from the entire test?

Yes.

How did the total execution time compare as well?

The overall time taken by fnv is not much different than one-at-a-time hash.

Depending on the hash the overall run time may be better if there are fewer collisions, and so on...

Is there any reason for the quality of xxhash to be inferior to fnv? If not, I wouldn't think that collisions would be significantly worse.

xxh

$ cat xxh.perf
Samples: 10K of event 'cycles', Event count (approx.): 7920116200
  Children      Self  Command     Shared Object     Symbol
+   12.66%    10.90%  sysrepocfg  libyang.so.3.8.2  [.] lyht_find_rec                                                                                                              ◆
+    6.47%     0.23%  sysrepocfg  libyang.so.3.8.2  [.] lyht_find_with_val_cb                                                                                                      ▒
+    5.74%     0.55%  sysrepocfg  libyang.so.3.8.2  [.] lyht_find                                                                                                                  ▒
+    3.20%     0.11%  sysrepocfg  libyang.so.3.8.2  [.] lyht_resize                                                                                                                ▒
+    2.97%     1.09%  sysrepocfg  libyang.so.3.8.2  [.] _lyht_insert_with_resize_cb                                                                                                ▒
+    2.86%     0.18%  sysrepocfg  libyang.so.3.8.2  [.] lyht_init_hlists_and_records                                                                                               ▒
+    2.43%     0.12%  sysrepocfg  libyang.so.3.8.2  [.] lyht_remove_with_resize_cb                                                                                                 ▒
+    2.13%     0.56%  sysrepocfg  libyang.so.3.8.2  [.] lyht_hash_multi                                                                                                            ▒
+    2.02%     0.00%  sysrepocfg  libyang.so.3.8.2  [.] lyht_get_rec (inlined)                                                                                                     ▒
+    1.63%     0.17%  sysrepocfg  libyang.so.3.8.2  [.] lyht_free                                                                                                                  ▒
+    0.64%     0.05%  sysrepocfg  libyang.so.3.8.2  [.] lyht_hash                                                                                                                  ▒
+    0.54%     0.02%  sysrepocfg  libyang.so.3.8.2  [.] lyht_new                                                                                                                   ▒

fnv hash

$ cat fnv.perf
Samples: 10K of event 'cycles', Event count (approx.): 8401485919
  Children      Self  Command     Shared Object     Symbol
+   11.02%     9.55%  sysrepocfg  libyang.so.3.8.2  [.] lyht_find_rec
+    9.07%     9.06%  sysrepocfg  libyang.so.3.8.2  [.] lyht_hash_multi
+    5.52%     0.24%  sysrepocfg  libyang.so.3.8.2  [.] lyht_find_with_val_cb
+    4.91%     0.45%  sysrepocfg  libyang.so.3.8.2  [.] lyht_find
+    2.99%     0.22%  sysrepocfg  libyang.so.3.8.2  [.] lyht_resize
+    2.87%     0.80%  sysrepocfg  libyang.so.3.8.2  [.] _lyht_insert_with_resize_cb
+    2.55%     0.13%  sysrepocfg  libyang.so.3.8.2  [.] lyht_init_hlists_and_records
+    2.21%     0.09%  sysrepocfg  libyang.so.3.8.2  [.] lyht_remove_with_resize_cb
+    1.54%     0.00%  sysrepocfg  libyang.so.3.8.2  [.] lyht_get_rec (inlined)
+    1.36%     0.19%  sysrepocfg  libyang.so.3.8.2  [.] lyht_free
     0.45%     0.05%  sysrepocfg  libyang.so.3.8.2  [.] lyht_new

use xxhash instead of jenkins hash

ca88692

In several benchmarks, xxhash scores far better than the jenkins one at a time hash function. See https://xxhash.com/ for a comparison.

irfanHaslanded force-pushed the irfan/xxhash branch from 22dd6f0 to ca88692 Compare March 4, 2025 14:33

irfanHaslanded mentioned this pull request Mar 4, 2025

enhancement - use a faster hash function for lyht_hash_multi #2364

Open

irfanHaslanded changed the title ~~use xxhash instead of jenkins hash~~ use xxhash instead of jenkins one-at-a-time hash Mar 4, 2025

irfanHaslanded changed the title ~~use xxhash instead of jenkins one-at-a-time hash~~ use xxhash instead of jenkins one-at-a-time hash, if available Mar 5, 2025

irfanHaslanded force-pushed the irfan/xxhash branch from 2829b4a to bf3d65e Compare March 5, 2025 12:22

make xxhash dependency optional

2a76347

irfanHaslanded force-pushed the irfan/xxhash branch from bf3d65e to 2a76347 Compare March 5, 2025 12:26

michalvasko reviewed Mar 5, 2025

View reviewed changes

michalvasko merged commit da81ee5 into CESNET:devel Mar 5, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use xxhash instead of jenkins one-at-a-time hash, if available #2365

use xxhash instead of jenkins one-at-a-time hash, if available #2365

Uh oh!

irfanHaslanded commented Mar 4, 2025 •

edited

Loading

Uh oh!

irfanHaslanded commented Mar 4, 2025 •

edited

Loading

Uh oh!

michalvasko commented Mar 5, 2025

Uh oh!

irfanHaslanded commented Mar 5, 2025

Uh oh!

michalvasko commented Mar 5, 2025

Uh oh!

irfanHaslanded commented Mar 5, 2025

Uh oh!

michalvasko left a comment

Uh oh!

Uh oh!

bradh352 commented Mar 7, 2025

Uh oh!

michalvasko commented Mar 10, 2025

Uh oh!

irfanHaslanded commented Mar 10, 2025

Uh oh!

irfanHaslanded commented Mar 10, 2025

Uh oh!

bradh352 commented Mar 10, 2025 •

edited

Loading

Uh oh!

irfanHaslanded commented Mar 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

use xxhash instead of jenkins one-at-a-time hash, if available #2365

use xxhash instead of jenkins one-at-a-time hash, if available #2365

Uh oh!

Conversation

irfanHaslanded commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

irfanHaslanded commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michalvasko commented Mar 5, 2025

Uh oh!

irfanHaslanded commented Mar 5, 2025

Uh oh!

michalvasko commented Mar 5, 2025

Uh oh!

irfanHaslanded commented Mar 5, 2025

Uh oh!

michalvasko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bradh352 commented Mar 7, 2025

Uh oh!

michalvasko commented Mar 10, 2025

Uh oh!

irfanHaslanded commented Mar 10, 2025

Uh oh!

irfanHaslanded commented Mar 10, 2025

Uh oh!

bradh352 commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

irfanHaslanded commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

xxh

fnv hash

Uh oh!

Uh oh!

irfanHaslanded commented Mar 4, 2025 •

edited

Loading

irfanHaslanded commented Mar 4, 2025 •

edited

Loading

bradh352 commented Mar 10, 2025 •

edited

Loading

irfanHaslanded commented Mar 10, 2025 •

edited

Loading