-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
device: add BenchmarkAllowedIPsInsertRemove #36
base: master
Are you sure you want to change the base?
device: add BenchmarkAllowedIPsInsertRemove #36
Conversation
To show that RemoveByPeer is slow. Currently: (pprof) top Showing nodes accounting for 2.99s, 96.14% of 3.11s total Dropped 35 nodes (cum <= 0.02s) Showing top 10 nodes out of 36 flat flat% sum% cum cum% 2.72s 87.46% 87.46% 2.72s 87.46% golang.zx2c4.com/wireguard/device.(*trieEntry).removeByPeer 0.10s 3.22% 90.68% 0.10s 3.22% runtime.memclrNoHeapPointers 0.05s 1.61% 92.28% 0.06s 1.93% runtime.scanobject 0.03s 0.96% 93.25% 0.05s 1.61% runtime.casgstatus 0.02s 0.64% 93.89% 0.02s 0.64% runtime.(*gcBitsArena).tryAlloc (inline) 0.02s 0.64% 94.53% 0.02s 0.64% runtime.heapBitsSetType 0.02s 0.64% 95.18% 0.04s 1.29% runtime.sweepone 0.01s 0.32% 95.50% 0.02s 0.64% golang.zx2c4.com/wireguard/device.commonBits 0.01s 0.32% 95.82% 0.03s 0.96% runtime.(*mheap).allocSpan 0.01s 0.32% 96.14% 0.24s 7.72% runtime.mallocgc Signed-off-by: Brad Fitzpatrick <[email protected]>
Same issue in the kernel code. That's a hard traversal to speed up without increasing the size of each node beyond a cacheline and therefore making lookups slow. Any suggestions? |
At least in our case (and perhaps with others?), the overwhelming majority of routes are complete IPv4 or IPv6 addresses (cidr /32 or /128). I was planning on adding a Go map alongside the trie and using both: map for complete addresses and trie for prefixes. That does mean some lookups (for non-complete addresses) need to consult both. I'm fine with that if it means reducing the removeByPeer cost, which is eating 40% of our CPU on our big shared test node accessible to all users. |
Instead of trying to add special cases -- whose complexity I wouldn't be so happy about having here -- what about implementing better/faster algorithms for the general case? Specifically, check out https://github.com/openbsd/src/blob/master/sys/net/art.c https://github.com/openbsd/src/blob/master/sys/net/art.h I would very very gladly take an implementation of this directly into wireguard-go (and would prefer it there instead of in a separate repo). |
Oh, nice, I hadn't seen that. PDF from the comments there: http://www.hariguchi.org/art/art.pdf |
Right. Basically it sounds like what happened somebody submitted a paper for a new routing table data structure. Knuth reviewed it, and during the review thought of something better. And that's ART. LC-Tries are also pretty fast, but not very fun to implement, and ART may well outperform it. Weidong Wu has a great book called "Packet Forwarding Technologies" that compares a lot of these different structures, but the latest addition I've found is 2007, which doesn't cover ART unfortunately. However, the combination of versatility, code compactness, and simplicity makes me prefer ART over other ones I've implemented in toys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark LGTM
(The ART data structure is nice.)
a.RemoveByPeer(peers[(i+num/2)%num]) | ||
} | ||
|
||
// Finally, some stats & validity checks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This work at the end is getting added to your total benchmark time and making your number fuzzier. Does calling b.StopTimer()
just before this work?
rand.Seed(1) | ||
rand.Shuffle(num, func(i, j int) { ips[i], ips[j] = ips[j], ips[i] }) | ||
|
||
// Then repeatedly add one and remove one that was insert 32k inserts back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/insert /inserted /
e467e07
to
2a607d1
Compare
c597c63
to
3b3de75
Compare
8ae4473
to
70b7b71
Compare
6b5293b
to
6005c57
Compare
58beb0f
to
54dbe24
Compare
1ae3898
to
4e9e5da
Compare
eba36c5
to
ffb742d
Compare
89a9432
to
b9669b7
Compare
To show that RemoveByPeer is slow. Currently:
Signed-off-by: Brad Fitzpatrick [email protected]
/cc @zx2c4 @crawshaw @danderson