Skip to content

[BUG] Extremely slow "add policy route" #4822

Closed
@zsxsoft

Description

@zsxsoft

Kube-OVN Version

v1.12.28

Kubernetes Version

v1.31.2

Operation-system/Kernel Version

TencentOS Server 4.2
6.6.47-12.tl4.x86_64

Description

This issue contains 2 problems.

I have a cluster with 10 nodes, 260 subnets in 1 vpc, ~5k ports. Today, I discovered that my ovs-ovn on some nodes was killed due to OOM. Therefore, I increased the memory limit and restarted the kube-ovn-controller.

Then I found my Work Queue Latency has remained at a very high level. (>10min)
image

I noticed that the controller was continuously performing "add policy route" operations in the logs at a VERY SLOW pace (approximately 1-3 seconds per entry). This's the first problem.

image

I understand that after restarting the KubeOVN controller, it needs to traverse all 10 nodes and 260 subnets. I expected the number of add policy route operations to be ~2600.

[root@vm-master-1 a]# cat 2.log | grep 'add policy route' | wc -l
3558

However, after waiting for a long time, I found that this number far exceeded than it, and there appeared to be a large number of duplicate operations. (Same node, same subnet, but executed twice)

[root@vm-master-1 a]# cat 2.log | grep 'add policy route' | grep 'net.a'
I1212 17:05:13.323328       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.1_ip4, action reroute, extrenalID map[node:node-1 subnet:net-a vendor:kube-ovn]
I1212 17:11:55.754750       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.2_ip4, action reroute, extrenalID map[node:node-2 subnet:net-a vendor:kube-ovn]
I1212 17:14:03.134696       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.4_ip4, action reroute, extrenalID map[node:node-4 subnet:net-a vendor:kube-ovn]
I1212 17:19:21.932002       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.3_ip4, action reroute, extrenalID map[node:node-3 subnet:net-a vendor:kube-ovn]
I1212 17:21:50.341122       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.vm.master.1_ip4, action reroute, extrenalID map[node:vm-master-1 subnet:net-a vendor:kube-ovn]
I1212 17:23:18.262599       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.9_ip4, action reroute, extrenalID map[node:node-9 subnet:net-a vendor:kube-ovn]
I1212 17:31:00.166875       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.5_ip4, action reroute, extrenalID map[node:node-5 subnet:net-a vendor:kube-ovn]
I1212 17:33:28.833554       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.8_ip4, action reroute, extrenalID map[node:node-8 subnet:net-a vendor:kube-ovn]
I1212 17:34:44.164926       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.1_ip4, action reroute, extrenalID map[node:node-1 subnet:net-a vendor:kube-ovn]
I1212 17:34:46.367902       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.2_ip4, action reroute, extrenalID map[node:node-2 subnet:net-a vendor:kube-ovn]
I1212 17:34:49.141808       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.3_ip4, action reroute, extrenalID map[node:node-3 subnet:net-a vendor:kube-ovn]
I1212 17:34:51.828600       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.vm.master.1_ip4, action reroute, extrenalID map[node:vm-master-1 subnet:net-a vendor:kube-ovn]
I1212 17:34:54.537169       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.4_ip4, action reroute, extrenalID map[node:node-4 subnet:net-a vendor:kube-ovn]
I1212 17:34:57.445022       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.8_ip4, action reroute, extrenalID map[node:node-8 subnet:net-a vendor:kube-ovn]
I1212 17:35:00.533108       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.5_ip4, action reroute, extrenalID map[node:node-5 subnet:net-a vendor:kube-ovn]
I1212 17:35:03.406251       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.6_ip4, action reroute, extrenalID map[node:node-6 subnet:net-a vendor:kube-ovn]
I1212 17:35:06.634137       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.7_ip4, action reroute, extrenalID map[node:node-7 subnet:net-a vendor:kube-ovn]
I1212 17:35:09.738121       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.9_ip4, action reroute, extrenalID map[node:node-9 subnet:net-a vendor:kube-ovn]
I1212 17:36:34.646594       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.6_ip4, action reroute, extrenalID map[node:node-6 subnet:net-a vendor:kube-ovn]
I1212 17:44:43.345692       7 subnet.go:2524] add policy route for router: ovn-cluster, match ip4.src == $net.a.node.7_ip4, action reroute, extrenalID map[node:node-7 subnet:net-a vendor:kube-ovn]

Now I'm unable to create new subnets, so I plan to wait overnight and check again the next day to see if the operations have completed. If more information is needed, please contact me.

Steps To Reproduce

/

Current Behavior

/

Expected Behavior

/

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformanceAnything that can make Kube-OVN fastersubnetvpc

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions