You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now the hash function uses strings to determine hash value. However, this is quite expensive and we can also do better in terms of avoiding collisions.
While the port and address family are the most important parts where we want to avoid collisions, the IP addresses also need to be hashed properly so that ideally we won't get any collision at all.
So here's how IP addresses are assigned starting from the most common cause to the least common:
Remote peer determined via SO_PEERCRED:
IPv4: Equal to PID (uint32_t)
IPv6: 0xfe800000UUUUUUUUGGGGGGGGPPPPPPPP (U -> UID; G -> GID; P -> PID)
Implicit binding via connect or sendto/sendmsg:
IPv4: Equal to PID (uint32_t)
IPv6: 0xfe800000UUUUUUUUGGGGGGGGPPPPPPPP (U -> UID; G -> GID; P -> PID)
If address is already a loopback address passed to eg. bind, keep it as is.
If it's an implicit binding done via recvfrom or recvmsg, we get local bindings with the following IP addresses:
IPv4: 0x00XXXXXX (X -> Random value)
IPv6: 0xfe8000000000000000000000XXXXXXXX (X -> Random value)
Since size_t is guaranteed to have at least 16 bits, it's probably safe to fall back to only encode the port if size_t is 16 bit, since we usually have one low port and several ephemeral ports. Also, we're only interested in Unix domain sockets, so distinguishing between IPv4 and IPv6 could even be neglected.
If size_t is 32 bits or more, it's certainly a good idea to encode the IP address as well. Since the PID is always the last part of the address for the two most common cases, it's probably safe to only encode the PID. Similarly even if we already got a loopback address or a random IP address, the last 32 bits are the interesting ones.
However, we still need the port and address family, so we certainly want to trim the IP address portion down to 16 bit if we have exactly 32 bits. Of course, on 64 bit systems, we can even use 48 bits, so collisions are even more unlikely to occur. What I'm not sure about here is which value to use for upper 16 bits of those 48 bits. 0xfe80? Or just a bit flag?
Update: I've removed the first two cases above, because they're not actually used in an unordered_map.
The text was updated successfully, but these errors were encountered:
The LGTM service constantly annoys me with this alert, which in fact
isn't actually a thing that's broken, because the function in question
is just slower than it could be.
This is going to be addressed at some point[1] but it's not something
critical and since "FIXME" is the wrong label anyway, let's get rid of
that annoying alert.
[1]: #25
Signed-off-by: aszlig <[email protected]>
Right now the hash function uses strings to determine hash value. However, this is quite expensive and we can also do better in terms of avoiding collisions.
While the port and address family are the most important parts where we want to avoid collisions, the IP addresses also need to be hashed properly so that ideally we won't get any collision at all.
So here's how IP addresses are assigned starting from the most common cause to the least common:
Remote peer determined viaSO_PEERCRED
:IPv4: Equal to PID (uint32_t
)IPv6:0xfe800000UUUUUUUUGGGGGGGGPPPPPPPP
(U -> UID; G -> GID; P -> PID)Implicit binding viaconnect
orsendto
/sendmsg
:IPv4: Equal to PID (uint32_t
)IPv6:0xfe800000UUUUUUUUGGGGGGGGPPPPPPPP
(U -> UID; G -> GID; P -> PID)If address is already a loopback address passed to eg.
bind
, keep it as is.If it's an implicit binding done via
recvfrom
orrecvmsg
, we get local bindings with the following IP addresses:0x00XXXXXX
(X -> Random value)0xfe8000000000000000000000XXXXXXXX
(X -> Random value)Since
size_t
is guaranteed to have at least 16 bits, it's probably safe to fall back to only encode the port ifsize_t
is 16 bit, since we usually have one low port and several ephemeral ports. Also, we're only interested in Unix domain sockets, so distinguishing between IPv4 and IPv6 could even be neglected.If
size_t
is 32 bits or more, it's certainly a good idea to encode the IP address as well. Since the PID is always the last part of the address for the two most common cases, it's probably safe to only encode the PID. Similarly even if we already got a loopback address or a random IP address, the last 32 bits are the interesting ones.However, we still need the port and address family, so we certainly want to trim the IP address portion down to 16 bit if we have exactly 32 bits. Of course, on 64 bit systems, we can even use 48 bits, so collisions are even more unlikely to occur. What I'm not sure about here is which value to use for upper 16 bits of those 48 bits.
0xfe80
? Or just a bit flag?Update: I've removed the first two cases above, because they're not actually used in an
unordered_map
.The text was updated successfully, but these errors were encountered: