device: add BenchmarkAllowedIPsInsertRemove #36

bradfitz · 2020-07-14T20:29:46Z

To show that RemoveByPeer is slow. Currently:

(pprof) top
Showing nodes accounting for 2.99s, 96.14% of 3.11s total
Dropped 35 nodes (cum <= 0.02s)
Showing top 10 nodes out of 36
      flat  flat%   sum%        cum   cum%
     2.72s 87.46% 87.46%      2.72s 87.46%  golang.zx2c4.com/wireguard/device.(*trieEntry).removeByPeer
     0.10s  3.22% 90.68%      0.10s  3.22%  runtime.memclrNoHeapPointers
     0.05s  1.61% 92.28%      0.06s  1.93%  runtime.scanobject
     0.03s  0.96% 93.25%      0.05s  1.61%  runtime.casgstatus
     0.02s  0.64% 93.89%      0.02s  0.64%  runtime.(*gcBitsArena).tryAlloc (inline)
     0.02s  0.64% 94.53%      0.02s  0.64%  runtime.heapBitsSetType
     0.02s  0.64% 95.18%      0.04s  1.29%  runtime.sweepone
     0.01s  0.32% 95.50%      0.02s  0.64%  golang.zx2c4.com/wireguard/device.commonBits
     0.01s  0.32% 95.82%      0.03s  0.96%  runtime.(*mheap).allocSpan
     0.01s  0.32% 96.14%      0.24s  7.72%  runtime.mallocgc

Signed-off-by: Brad Fitzpatrick [email protected]

/cc @zx2c4 @crawshaw @danderson

To show that RemoveByPeer is slow. Currently: (pprof) top Showing nodes accounting for 2.99s, 96.14% of 3.11s total Dropped 35 nodes (cum <= 0.02s) Showing top 10 nodes out of 36 flat flat% sum% cum cum% 2.72s 87.46% 87.46% 2.72s 87.46% golang.zx2c4.com/wireguard/device.(*trieEntry).removeByPeer 0.10s 3.22% 90.68% 0.10s 3.22% runtime.memclrNoHeapPointers 0.05s 1.61% 92.28% 0.06s 1.93% runtime.scanobject 0.03s 0.96% 93.25% 0.05s 1.61% runtime.casgstatus 0.02s 0.64% 93.89% 0.02s 0.64% runtime.(*gcBitsArena).tryAlloc (inline) 0.02s 0.64% 94.53% 0.02s 0.64% runtime.heapBitsSetType 0.02s 0.64% 95.18% 0.04s 1.29% runtime.sweepone 0.01s 0.32% 95.50% 0.02s 0.64% golang.zx2c4.com/wireguard/device.commonBits 0.01s 0.32% 95.82% 0.03s 0.96% runtime.(*mheap).allocSpan 0.01s 0.32% 96.14% 0.24s 7.72% runtime.mallocgc Signed-off-by: Brad Fitzpatrick <[email protected]>

zx2c4 · 2020-07-14T20:31:13Z

Same issue in the kernel code. That's a hard traversal to speed up without increasing the size of each node beyond a cacheline and therefore making lookups slow. Any suggestions?

zx2c4 · 2020-07-14T20:35:57Z

For cross-reference:

https://github.com/WireGuard/wireguard-linux/blob/0f57a1e522f413e87852e632f55de4723e511939/drivers/net/wireguard/allowedips.c#L69-L121

bradfitz · 2020-07-14T20:37:39Z

At least in our case (and perhaps with others?), the overwhelming majority of routes are complete IPv4 or IPv6 addresses (cidr /32 or /128). I was planning on adding a Go map alongside the trie and using both: map for complete addresses and trie for prefixes. That does mean some lookups (for non-complete addresses) need to consult both. I'm fine with that if it means reducing the removeByPeer cost, which is eating 40% of our CPU on our big shared test node accessible to all users.

zx2c4 · 2020-07-14T20:40:06Z

Instead of trying to add special cases -- whose complexity I wouldn't be so happy about having here -- what about implementing better/faster algorithms for the general case? Specifically, check out https://github.com/openbsd/src/blob/master/sys/net/art.c https://github.com/openbsd/src/blob/master/sys/net/art.h I would very very gladly take an implementation of this directly into wireguard-go (and would prefer it there instead of in a separate repo).

bradfitz · 2020-07-14T20:43:16Z

Oh, nice, I hadn't seen that. PDF from the comments there: http://www.hariguchi.org/art/art.pdf

zx2c4 · 2020-07-14T20:46:05Z

Right. Basically it sounds like what happened somebody submitted a paper for a new routing table data structure. Knuth reviewed it, and during the review thought of something better. And that's ART.

LC-Tries are also pretty fast, but not very fun to implement, and ART may well outperform it.

Weidong Wu has a great book called "Packet Forwarding Technologies" that compares a lot of these different structures, but the latest addition I've found is 2007, which doesn't cover ART unfortunately. However, the combination of versatility, code compactness, and simplicity makes me prefer ART over other ones I've implemented in toys.

crawshaw

Benchmark LGTM

(The ART data structure is nice.)

crawshaw · 2020-07-15T04:40:01Z

device/allowedips_test.go

+		a.RemoveByPeer(peers[(i+num/2)%num])
+	}
+
+	// Finally, some stats & validity checks.


This work at the end is getting added to your total benchmark time and making your number fuzzier. Does calling b.StopTimer() just before this work?

crawshaw · 2020-07-15T04:40:47Z

device/allowedips_test.go

+	rand.Seed(1)
+	rand.Shuffle(num, func(i, j int) { ips[i], ips[j] = ips[j], ips[i] })
+
+	// Then repeatedly add one and remove one that was insert 32k inserts back.


s/insert /inserted /

crawshaw approved these changes Jul 15, 2020

View reviewed changes

zx2c4-bot force-pushed the master branch 2 times, most recently from e467e07 to 2a607d1 Compare December 23, 2020 16:45

zx2c4-bot force-pushed the master branch 2 times, most recently from c597c63 to 3b3de75 Compare January 7, 2021 16:09

zx2c4-bot force-pushed the master branch from 06f1482 to 294d3be Compare January 20, 2021 19:12

zx2c4-bot force-pushed the master branch 17 times, most recently from 8ae4473 to 70b7b71 Compare February 25, 2021 14:08

zx2c4-bot force-pushed the master branch 4 times, most recently from 6b5293b to 6005c57 Compare March 9, 2021 04:32

zx2c4-bot force-pushed the master branch 6 times, most recently from 58beb0f to 54dbe24 Compare April 12, 2021 21:35

zx2c4-bot force-pushed the master branch 4 times, most recently from 1ae3898 to 4e9e5da Compare May 10, 2021 15:49

zx2c4-bot force-pushed the master branch from 9252f58 to 841756e Compare June 3, 2021 14:29

zx2c4-bot force-pushed the master branch from 0243978 to 23d4e52 Compare November 6, 2021 13:31

zx2c4-bot force-pushed the master branch 4 times, most recently from eba36c5 to ffb742d Compare November 16, 2021 20:16

zx2c4-bot force-pushed the master branch from ff73da5 to 387f7c4 Compare November 23, 2021 21:03

zx2c4-bot force-pushed the master branch 5 times, most recently from 89a9432 to b9669b7 Compare February 2, 2022 22:09

zx2c4-bot force-pushed the master branch from 3a0dfef to b51010b Compare September 4, 2022 10:57

zx2c4-bot force-pushed the master branch from c7b76d3 to 1e2c3e5 Compare February 16, 2023 15:34

zx2c4-bot force-pushed the master branch from 787da64 to f41f474 Compare March 10, 2023 13:53

zx2c4-bot force-pushed the master branch from d3cb5bd to 6f895be Compare March 24, 2023 16:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

device: add BenchmarkAllowedIPsInsertRemove #36

device: add BenchmarkAllowedIPsInsertRemove #36

bradfitz commented Jul 14, 2020

zx2c4 commented Jul 14, 2020

zx2c4 commented Jul 14, 2020 •

edited

Loading

bradfitz commented Jul 14, 2020

zx2c4 commented Jul 14, 2020

bradfitz commented Jul 14, 2020

zx2c4 commented Jul 14, 2020

crawshaw left a comment

crawshaw Jul 15, 2020

crawshaw Jul 15, 2020

device: add BenchmarkAllowedIPsInsertRemove #36

Are you sure you want to change the base?

device: add BenchmarkAllowedIPsInsertRemove #36

Conversation

bradfitz commented Jul 14, 2020

zx2c4 commented Jul 14, 2020

zx2c4 commented Jul 14, 2020 • edited Loading

bradfitz commented Jul 14, 2020

zx2c4 commented Jul 14, 2020

bradfitz commented Jul 14, 2020

zx2c4 commented Jul 14, 2020

crawshaw left a comment

Choose a reason for hiding this comment

crawshaw Jul 15, 2020

Choose a reason for hiding this comment

crawshaw Jul 15, 2020

Choose a reason for hiding this comment

zx2c4 commented Jul 14, 2020 •

edited

Loading