Skip to content

Conversation

@zjswhhh
Copy link
Contributor

@zjswhhh zjswhhh commented Dec 7, 2025

What I did
Support route update from vnet tunnel route config change for custom_bfd monitoring. In DPU repairing scenario (to form HA sets), VNetOrch needs to support live update of endpoints, monitoring endpoints, primary and secondary groups.

sign-off: Jing Zhang [email protected]

Why I did it
This change is required for DPU repair in HA.

How I did it
When there is a route config change (i.e. primary endpoint changes), pass a BFD state update with Init state to trigger the monitoring session creation & deletion, and route update. Notice that monitoring orch will pass Down and Up state to vnet orch.

How I verified it

  1. Step 1:
  • Original vnet route settings:
root@7e9990ddcd2f:/# redis-cli -n 0 hgetall "VNET_ROUTE_TUNNEL_TABLE:vnet33:100.100.1.1/32"
 1) "endpoint"
 2) "9.1.0.1,9.1.0.2"
 3) "endpoint_monitor"
 4) "9.1.0.1,9.1.0.2"
 5) "primary"
 6) "9.1.0.1"
 7) "monitoring"
 8) "custom_bfd"
 9) "adv_prefix"
10) "100.100.1.1/32"
11) "check_directly_connected"
12) "true"
13) "rx_monitor_timer"
14) "100"
15) "tx_monitor_timer"
16) "100"
  • Route pointing to the primary as both bfd sessions are up:
root@7e9990ddcd2f:/# redis-cli -n 6 hgetall   "VNET_ROUTE_TUNNEL_TABLE|vnet33|100.100.1.1/32"
1) "active_endpoints"
2) "9.1.0.1"
3) "state"
4) "active"

root@7e9990ddcd2f:/# show bfd sum 
/bin/sh: 1: sudo: not found
Total number of BFD sessions: 2
Peer Addr    Interface    Vrf      State    Type          Local Addr      TX Interval    RX Interval    Multiplier  Multihop      Local Discriminator
-----------  -----------  -------  -------  ------------  ------------  -------------  -------------  ------------  ----------  ---------------------
9.1.0.2      default      default  Up       async_active  9.9.9.9                 100            100            10  true                            2
9.1.0.1      default      default  Up       async_active  9.9.9.9                 100            100            10  true                            1
  1. Step 2:
  • Update the remote endpoint, also change the primary to remote:
root@7e9990ddcd2f:/# redis-cli -n 0 hgetall "VNET_ROUTE_TUNNEL_TABLE:vnet33:100.100.1.1/32"
 1) "endpoint"
 2) "9.1.0.1,9.1.0.3"
 3) "endpoint_monitor"
 4) "9.1.0.1,9.1.0.3"
 5) "primary"
 6) "9.1.0.3"
 7) "monitoring"
 8) "custom_bfd"
... ...
  • Before the new primary's monitoring session is up, route still point to local:
root@7e9990ddcd2f:/# redis-cli -n 6 hgetall   "VNET_ROUTE_TUNNEL_TABLE|vnet33|100.100.1.1/32"
1) "active_endpoints"
2) "9.1.0.1"
3) "state"
4) "active"
root@7e9990ddcd2f:/# show bfd sum 
/bin/sh: 1: sudo: not found
Total number of BFD sessions: 2
Peer Addr    Interface    Vrf      State    Type          Local Addr      TX Interval    RX Interval    Multiplier  Multihop      Local Discriminator
-----------  -----------  -------  -------  ------------  ------------  -------------  -------------  ------------  ----------  ---------------------
9.1.0.1      default      default  Up       async_active  9.9.9.9                 100            100            10  true                            1
9.1.0.3      default      default  Down     async_active  9.9.9.9                 100            100            10  true                            3
  • Bring up the new primary's monitoring session, route switches:
root@7e9990ddcd2f:/# redis-cli -n 6 hgetall   "VNET_ROUTE_TUNNEL_TABLE|vnet33|100.100.1.1/32"
1) "active_endpoints"
2) "9.1.0.3"
3) "state"
4) "active"
root@7e9990ddcd2f:/# show bfd sum 
/bin/sh: 1: sudo: not found
Total number of BFD sessions: 2
Peer Addr    Interface    Vrf      State    Type          Local Addr      TX Interval    RX Interval    Multiplier  Multihop      Local Discriminator
-----------  -----------  -------  -------  ------------  ------------  -------------  -------------  ------------  ----------  ---------------------
9.1.0.1      default      default  Up       async_active  9.9.9.9                 100            100            10  true                            1
9.1.0.3      default      default  Up       async_active  9.9.9.9                 100            100            10  true                            3

Details if related

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@zjswhhh zjswhhh changed the title vnetorch supporting ha live repairing [ssw][ha] vnetorch supporting DPU live repairing Dec 8, 2025
# Re-pairing and change primary to remote
create_vnet_routes(dvs, "100.100.1.1/32", vnet_name, '9.1.0.1,9.1.0.3', ep_monitor='9.1.0.1,9.1.0.3', primary ='9.1.0.3', monitoring='custom_bfd', adv_prefix='100.100.1.1/32', check_directly_connected=True, rx_monitor_timer=100, tx_monitor_timer=100)

# BFD session should have been removed for the old remote endpoint

Check warning

Code scanning / CodeQL

Variable defined multiple times Warning test

This assignment to 'route1' is unnecessary as it is
redefined
before this value is used.
create_vnet_entry(dvs, vnet_name, tunnel_name, '10029', "", advertise_prefix=True, overlay_dmac="22:33:33:44:44:66")

vnet_obj.check_vnet_entry(dvs, vnet_name)
vnet_obj.check_vxlan_tunnel_entry(dvs, tunnel_name, vnet_name, '10029')

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable asic_db is not used.
# Remove tunnel route 1
delete_vnet_routes(dvs, "100.100.1.1/32", vnet_name)
time.sleep(2)
vnet_obj.check_del_vnet_routes(dvs, vnet_name, ["100.100.1.1/32"])

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable route1 is not used.
@zjswhhh zjswhhh changed the title [ssw][ha] vnetorch supporting DPU live repairing [ssw][ha] vnetorch supporting DPU live re-pairing Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants