Improve hash function for SockAddr #25

aszlig · 2020-07-20T01:24:05Z

Right now the hash function uses strings to determine hash value. However, this is quite expensive and we can also do better in terms of avoiding collisions.

While the port and address family are the most important parts where we want to avoid collisions, the IP addresses also need to be hashed properly so that ideally we won't get any collision at all.

So here's how IP addresses are assigned starting from the most common cause to the least common:

~~Remote peer determined via SO_PEERCRED:~~
- ~~IPv4: Equal to PID (uint32_t)~~
- ~~IPv6: 0xfe800000UUUUUUUUGGGGGGGGPPPPPPPP (U -> UID; G -> GID; P -> PID)~~
~~Implicit binding via connect or sendto/sendmsg:~~
- ~~IPv4: Equal to PID (uint32_t)~~
- ~~IPv6: 0xfe800000UUUUUUUUGGGGGGGGPPPPPPPP (U -> UID; G -> GID; P -> PID)~~
If address is already a loopback address passed to eg. bind, keep it as is.
If it's an implicit binding done via recvfrom or recvmsg, we get local bindings with the following IP addresses:
- IPv4: 0x00XXXXXX (X -> Random value)
- IPv6: 0xfe8000000000000000000000XXXXXXXX (X -> Random value)

Since size_t is guaranteed to have at least 16 bits, it's probably safe to fall back to only encode the port if size_t is 16 bit, since we usually have one low port and several ephemeral ports. Also, we're only interested in Unix domain sockets, so distinguishing between IPv4 and IPv6 could even be neglected.

If size_t is 32 bits or more, it's certainly a good idea to encode the IP address as well. Since the PID is always the last part of the address for the two most common cases, it's probably safe to only encode the PID. Similarly even if we already got a loopback address or a random IP address, the last 32 bits are the interesting ones.

However, we still need the port and address family, so we certainly want to trim the IP address portion down to 16 bit if we have exactly 32 bits. Of course, on 64 bit systems, we can even use 48 bits, so collisions are even more unlikely to occur. What I'm not sure about here is which value to use for upper 16 bits of those 48 bits. 0xfe80? Or just a bit flag?

Update: I've removed the first two cases above, because they're not actually used in an unordered_map.

The text was updated successfully, but these errors were encountered:

The LGTM service constantly annoys me with this alert, which in fact isn't actually a thing that's broken, because the function in question is just slower than it could be. This is going to be addressed at some point[1] but it's not something critical and since "FIXME" is the wrong label anyway, let's get rid of that annoying alert. [1]: #25 Signed-off-by: aszlig <[email protected]>

ibc mentioned this issue Jun 20, 2022

Working on new TransportTuple.hash implementation versatica/mediasoup#839

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve hash function for SockAddr #25

Improve hash function for SockAddr #25

aszlig commented Jul 20, 2020 •

edited

Loading

Improve hash function for SockAddr #25

Improve hash function for SockAddr #25

Comments

aszlig commented Jul 20, 2020 • edited Loading

aszlig commented Jul 20, 2020 •

edited

Loading