Skip to content

Liqo Network blocked when liqo-controller-manager pod restart then worker node reboot #3082

@huntkalio

Description

@huntkalio

Is there an existing issue for this?

  • I have searched the existing issues

Version

1.0.1

What happened?

I have two k8s cluster(cluser A, cluster B).Both CNI is flannel(Kernel : 6.1.0-32-amd64).I use liqo to peer two cluster.Now cluster A pod can connect cluster B pod.A cluster as consumer cluster,B cluster as remote provider cluster.
Then I reboot some nodes in cluster A , after a moment , wait all pod restart.Then restart nodes pod can't connect cluster B pod but below show all things right

 liqoctl info peer 
┌─ Peer cluster info ──────────────────────────────────────────────────────────────┐
|  Cluster ID: old-v5                                                              |
|  Role:       Provider                                                            |
└──────────────────────────────────────────────────────────────────────────────────┘
┌─ Network ────────────────────────────────────────────────────────────────────────┐
|  Status: Healthy                                                                 |
|  CIDR                                                                            |
|      Remote                                                                      |
|          Pod CIDR:      172.22.0.0/16 → Remapped to 172.22.0.0/16                |
|          External CIDR: 10.70.0.0/16 → Remapped to 10.71.0.0/16                  |
|  Gateway                                                                         |
|      Role:    Server                                                             |
|      Address: 172.30.97.106                                                      |
|      Port:    32359                                                              |
└──────────────────────────────────────────────────────────────────────────────────┘
┌─ Authentication ─────────────────────────────────────────────────────────────────┐
|  Status:     Healthy                                                             |
|  API server: https://172.30.97.105:6443                                          |
|  Resource slices                                                                 |
|      old-v5                                                                      |
|          Action: Consuming                                                       |
|          Resource slice accepted                                                 |
|          Resources                                                               |
|              pods:              110                                              |
|              cpu:               4                                                |
|              ephemeral-storage: 20Gi                                             |
|              memory:            8Gi                                              |
└──────────────────────────────────────────────────────────────────────────────────┘
┌─ Offloading ─────────────────────────────────────────────────────────────────────┐
|  Status: Healthy                                                                 |
|  Virtual nodes                                                                   |
|      old-v5                                                                      |
|          Status:         Healthy                                                 |
|          Secret:         kubeconfig-resourceslice-old-v5                         |
|          Resource slice: old-v5                                                  |
|          Resources                                                               |
|              pods:              110                                              |
|              cpu:               4                                                |
|              ephemeral-storage: 20Gi                                             |
|              memory:            8Gi                                              |
└──────────────────────────────────────────────────────────────────────────────────┘


 kubectl get connection -A
NAMESPACE            NAME        TYPE     STATUS      AGE
liqo-tenant-old-v5   gw-old-v5   Client   Connected   2d

Then I use https://github.com/liqotech/liqo/blob/master/docs/faq/faq.md to Sniff the traffic inside the gateway, And found that:
Cluster A(consumer cluster): gateway pod can't receive the ping ( when cluster A pod ping cluster B pod)
Cluster B(provider cluster): gateway pod can receive the ping ( when cluster B pod ping cluster A pod)

If I kill cluster A gateway pod wait another gateway pod start, The network will recovery.cluster A restart node pod can connect cluster B pod

Relevant log output

How can we reproduce the issue?

1.create cluster A and B
2.use liqo to peer two cluser
3.restart worker node in cluater A.

Provider or distribution

rancher

CNI version

flannel v1.4.1-rancher1

Kernel Version

6.1.0-32

Kubernetes Version

1.30

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugReport a bug encountered while operating Liqo

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions