Skip to content

Liqo Network bottleneck #3079

@huntkalio

Description

@huntkalio

I have two k8s cluster(cluser A, cluster B).Both CNI is flannel(Kernel : 6.1.0-32-amd64).I use liqo to peer two cluster.Now cluster A pod can connect cluster B pod.

Then I started nginx pod in cluster B, and use wrk in cluster A to test the network between cluster A and cluster B.

./wrk -t 2 -c 100 -d 60 http://172.22.2.77:80
Running 1m test @ http://172.22.2.77:80
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.44ms    1.05ms  18.55ms   77.95%
    Req/Sec    14.58k   709.75    16.58k    70.08%
  1740537 requests in 1.00m, 1.38GB read
Requests/sec:  29003.92
Transfer/sec:     23.59MB

I found the gateway client pod in cluster A and the gateway server pod in cluster B have network bottleneck because of softirq.
The machine which the gateway pod in have 4 cpu core, but one of the cpu have higher CPU in softirq than others.

%Cpu0  :  1.8 us, 10.4 sy,  0.0 ni, 45.4 id,  0.0 wa,  0.0 hi, 42.5 si,  0.0 st 
%Cpu1  :  1.3 us,  6.4 sy,  0.0 ni, 30.9 id,  0.0 wa,  0.0 hi, 61.4 si,  0.0 st 
%Cpu2  :  1.0 us, 11.8 sy,  0.0 ni, 27.7 id,  0.3 wa,  0.0 hi, 59.2 si,  0.0 st 
%Cpu3  :  1.8 us,  8.6 sy,  0.0 ni, 46.8 id,  0.0 wa,  0.0 hi, 42.9 si,  0.0 st 
MiB Mem :   7479.5 total,   4282.7 free,   1114.1 used,   2319.8 buff/cache     
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   6365.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                                              
     21 root      20   0       0      0      0 S  33.9   0.0   3:15.90 ksoftirqd/1                                                                                                                                                                                          
 221930 root      20   0       0      0      0 I  30.2   0.0   0:20.16 kworker/2:1-wg-crypt-liqo-tunnel                                                                                                                                                                     
 271980 root      20   0       0      0      0 I  19.6   0.0   0:16.11 kworker/2:3-wg-crypt-liqo-tunnel                                                                                                                                                                     
 185337 root      20   0       0      0      0 I  17.6   0.0   0:21.64 kworker/3:0-wg-crypt-liqo-tunnel                                                                                                                                                                     
 271993 root      20   0       0      0      0 I  15.3   0.0   0:08.66 kworker/0:3-wg-crypt-liqo-tunnel                                                                                                                                                                     
 271992 root      20   0       0      0      0 I  13.6   0.0   0:09.55 kworker/3:3-wg-crypt-liqo-tunnel                                                                                                                                                                     
 271877 root      20   0       0      0      0 I  11.3   0.0   0:16.66 kworker/2:0-wg-crypt-liqo-tunnel                                                                                                                                                                     
 271876 root      20   0       0      0      0 I   9.6   0.0   0:11.01 kworker/0:0-wg-crypt-liqo-tunnel                                                                                                                                                                     
     14 root      20   0       0      0      0 S   8.0   0.0   1:30.11 ksoftirqd/0                                                                                                                                                                                          
     31 root      20   0       0      0      0 S   7.6   0.0   1:57.10 ksoftirqd/3                                                                                                                                                                                          
 263141 root      20   0       0      0      0 R   7.0   0.0   0:06.62 kworker/1:2-wg-crypt-liqo-tunnel                                                                                                                                                                     
 214737 root      20   0       0      0      0 I   6.3   0.0   0:15.86 kworker/0:1-wg-crypt-liqo-tunnel                                                                                                                                                                     
 272576 root      20   0       0      0      0 R   6.3   0.0   0:02.64 kworker/1:1-wg-crypt-liqo-tunnel                                                                                                                                                                     
     26 root      20   0       0      0      0 S   4.7   0.0   1:33.75 ksoftirqd/2                                                                                                                                                                                          
   1388 root      20   0 1998456 141988  61512 S   3.3   1.9  13:40.57 kubelet                                                                                                                                                                                              
 201094 root      20   0       0      0      0 I   2.0   0.0   0:15.72 kworker/3:2-wg-crypt-liqo-tunnel  
     21 root      20   0       0      0      0 S  33.9   0.0   3:15.90 ksoftirqd/1   
     14 root      20   0       0      0      0 S   8.0   0.0   1:30.11 ksoftirqd/0                                                                                                                                                                                          
     31 root      20   0       0      0      0 S   7.6   0.0   1:57.10 ksoftirqd/3 
     26 root      20   0       0      0      0 S   4.7   0.0   1:33.75 ksoftirqd/2 

This may due to the wireguard client(ip,port) and wireguard server (ip,port) always same, then WireGuard's UDP tunnel traffic has a fixed five-tuple (same source/destination IP + port), RSS toeplitz hash maps all tunnel traffic to the same queue of NIC。And because each queue's interrupts will be assigned to a specific CPU for processing,so now even if I upgrade the machine to 8 cores, the network improvement will not be very obvious 。

How to solve this problem and how to vertically expand the network performance of the gateway(between two cluster)? I want the wireguard gateway can handle hundreds of thousands of requests per second。Or does wireguard support something like udp connection pooling or multi endpoints?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions