-
Notifications
You must be signed in to change notification settings - Fork 649
[dualtor] Remove checks for subnet IP from muxorch logic #4011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Today we allow neighbor IPs which are outside a mux cable's subnet to be learned on any mux port. However, internal logic prevents successful processing of these IPs in certain scenarios. This removes the check that compares the ip to the Vlan subnet so that it will be processed as a regular mux neighbor. Signed-off-by: Nikola Dancejic <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request aims to remove subnet IP checks from muxorch logic to allow neighbors outside a mux cable's subnet to be processed correctly, fixing MAC move issues between VLAN subnets. However, the PR contains several critical bugs that prevent it from functioning correctly.
Key intended changes:
- Remove
isIpInSubnet()method and related subnet validation logic from C++ muxorch - Add comprehensive test coverage for MAC moves between different VLAN subnets
- Update helper methods to support operations on multiple VLANs
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| orchagent/muxorch.cpp | Removes subnet IP validation logic, but introduces critical bugs in findMuxCableInSubnet() and updateNeighbor() that break multi-cable scenarios |
| tests/test_mux.py | Adds new test for cross-VLAN MAC moves and updates helper methods, but contains structural issues with duplicate code, missing constants, and missing helper methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (auto it = mux_cable_tb_.begin(); it != mux_cable_tb_.end(); it++) | ||
| { | ||
| MuxCable* ptr = it->second.get(); | ||
| if (ptr->isIpInSubnet(ip)) | ||
| { | ||
| return ptr; | ||
| } | ||
| MuxCable* ptr = it->second.get(); | ||
| return ptr; | ||
| } |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The findMuxCableInSubnet function now unconditionally returns the first mux cable in the table, regardless of which IP is being queried. This will cause incorrect behavior when multiple mux cables exist. The function should either:
- Return nullptr if the subnet check logic is truly being removed, or
- Implement an alternative method to find the correct mux cable (e.g., by port lookup from FDB).
Currently, this will match any IP to the first available mux cable, which is incorrect logic.
| else: | ||
| self.check_tunnel_route_in_app_db(dvs, [prefix], expected=True) | ||
| for _ in range(10): | ||
| if not self.neigh_exists_in_asic(asicdb, ip_addr, vlan_id): |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method self.neigh_exists_in_asic() is called but is not defined in the TestMuxTunnelBase class. This method needs to be implemented to check if a neighbor exists in the ASIC database for the given VLAN.
| def test_mac_move_vlan( | ||
| self, dvs, setup, setup_vlan, setup_mux_cable, setup_peer_switch, | ||
| setup_tunnel, setup_mac_move_vlan_test, neighbor_cleanup, testlog | ||
| ): | ||
| """Validate neighbor move behavior between Vlan1000 and Vlan2000 mux ports""" | ||
|
|
||
| if expected: | ||
| appdb.wait_for_matching_keys(self.APP_TUNNEL_ROUTE_TABLE_NAME, destinations) | ||
| else: | ||
| appdb.wait_for_deleted_keys(self.APP_TUNNEL_ROUTE_TABLE_NAME, destinations) | ||
|
|
||
| def check_neigh_in_asic_db(self, asicdb, ip, expected=True): | ||
| rif_oid = self.get_vlan_rif_oid(asicdb) | ||
| switch_oid = self.get_switch_oid(asicdb) | ||
| neigh_key_map = { | ||
| "ip": ip, | ||
| "rif": rif_oid, | ||
| "switch_id": switch_oid | ||
| } | ||
| expected_key = json.dumps(neigh_key_map, sort_keys=True, separators=(',', ':')) | ||
| appdb_conn = swsscommon.DBConnector(swsscommon.APPL_DB, dvs.redis_sock, 0) | ||
| asicdb = dvs.get_asic_db() | ||
|
|
||
| if expected: | ||
| nbr_keys = asicdb.wait_for_matching_keys(self.ASIC_NEIGH_TABLE, [expected_key]) | ||
| mac_hyphen = "00-00-00-00-aa-aa" | ||
| mac_colon = mac_hyphen.replace('-', ':') | ||
| vlan1000_id = int(self.VLAN_1000_ID) | ||
| vlan2000_id = int(self.VLAN_2000_ID) | ||
|
|
||
| for key in nbr_keys: | ||
| if ip in key: | ||
| return key | ||
| ip_test_data = [ | ||
| ("192.168.0.250", self.IPV4_MASK), | ||
| ("fc02:1000::250", self.IPV6_MASK) | ||
| ] | ||
|
|
||
| else: | ||
| def verify_neighbor(ip_addr, mask, vlan_id, mux_state): | ||
| prefix = ip_addr + mask | ||
| if mux_state == ACTIVE: | ||
| self.check_neigh_in_asic_db(asicdb, ip_addr, vlan_id=vlan_id) | ||
| self.check_tunnel_route_in_app_db(dvs, [prefix], expected=False) | ||
| else: | ||
| self.check_tunnel_route_in_app_db(dvs, [prefix], expected=True) | ||
| for _ in range(10): | ||
| if not self.neigh_exists_in_asic(asicdb, ip_addr, vlan_id): | ||
| break | ||
| time.sleep(0.2) | ||
| else: | ||
| pytest.fail(f"Neighbor {ip_addr} unexpectedly present in ASIC for VLAN {vlan_id}") | ||
|
|
||
| def cleanup_entries(ip_addr, mask): | ||
| self.del_neighbor(dvs, ip_addr, vlan=self.VLAN_1000) | ||
| self.del_neighbor(dvs, ip_addr, vlan=self.VLAN_2000) | ||
| self.del_fdb(dvs, mac_hyphen) | ||
| self.del_fdb(dvs, mac_hyphen, vlan=self.VLAN_2000) | ||
| self.check_tunnel_route_in_app_db(dvs, [ip_addr + mask], expected=False) | ||
|
|
||
| self.set_mux_state(appdb_conn, "Ethernet0", ACTIVE) | ||
| self.wait_for_mux_state(dvs, "Ethernet0", ACTIVE) | ||
| self.set_mux_state(appdb_conn, "Ethernet8", ACTIVE) | ||
| self.wait_for_mux_state(dvs, "Ethernet8", ACTIVE) | ||
|
|
||
| for ip_addr, mask in ip_test_data: | ||
| cleanup_entries(ip_addr, mask) | ||
|
|
||
| for state_eth4, state_eth12 in itertools.product([ACTIVE, STANDBY], repeat=2): | ||
| self.set_mux_state(appdb_conn, "Ethernet4", state_eth4) | ||
| self.wait_for_mux_state(dvs, "Ethernet4", state_eth4) | ||
| self.set_mux_state(appdb_conn, "Ethernet12", state_eth12) | ||
| self.wait_for_mux_state(dvs, "Ethernet12", state_eth12) | ||
|
|
||
| for ip_addr, mask in ip_test_data: | ||
| cleanup_entries(ip_addr, mask) | ||
|
|
||
| self.add_fdb(dvs, "Ethernet4", mac_hyphen) | ||
| self.add_neighbor(dvs, ip_addr, mac_colon) | ||
| verify_neighbor(ip_addr, mask, vlan1000_id, state_eth4) | ||
|
|
||
| self.del_fdb(dvs, mac_hyphen) | ||
| self.del_neighbor(dvs, ip_addr, vlan=self.VLAN_1000) | ||
| self.add_fdb(dvs, "Ethernet12", mac_hyphen, vlan=self.VLAN_2000) | ||
| self.add_neighbor(dvs, ip_addr, mac_colon, vlan=self.VLAN_2000) | ||
| verify_neighbor(ip_addr, mask, vlan2000_id, state_eth12) | ||
|
|
||
| self.del_fdb(dvs, mac_hyphen, vlan=self.VLAN_2000) | ||
| self.del_neighbor(dvs, ip_addr, vlan=self.VLAN_2000) | ||
| self.add_fdb(dvs, "Ethernet4", mac_hyphen) | ||
| self.add_neighbor(dvs, ip_addr, mac_colon) | ||
| verify_neighbor(ip_addr, mask, vlan1000_id, state_eth4) | ||
|
|
||
| toggle_eth4 = [STANDBY, ACTIVE] if state_eth4 == ACTIVE else [ACTIVE, STANDBY] | ||
| for new_state in toggle_eth4: | ||
| self.set_mux_state(appdb_conn, "Ethernet4", new_state) | ||
| self.wait_for_mux_state(dvs, "Ethernet4", new_state) | ||
| verify_neighbor(ip_addr, mask, vlan1000_id, new_state) | ||
|
|
||
| self.del_fdb(dvs, mac_hyphen) | ||
| self.del_neighbor(dvs, ip_addr, vlan=self.VLAN_1000) | ||
| self.add_fdb(dvs, "Ethernet12", mac_hyphen, vlan=self.VLAN_2000) | ||
| self.add_neighbor(dvs, ip_addr, mac_colon, vlan=self.VLAN_2000) | ||
|
|
||
| toggle_eth12 = [STANDBY, ACTIVE] if state_eth12 == ACTIVE else [ACTIVE, STANDBY] | ||
| for new_state in toggle_eth12: | ||
| self.set_mux_state(appdb_conn, "Ethernet12", new_state) | ||
| self.wait_for_mux_state(dvs, "Ethernet12", new_state) | ||
| verify_neighbor(ip_addr, mask, vlan2000_id, new_state) | ||
|
|
||
| cleanup_entries(ip_addr, mask) | ||
|
|
||
| self.set_mux_state(appdb_conn, "Ethernet4", ACTIVE) | ||
| self.wait_for_mux_state(dvs, "Ethernet4", ACTIVE) | ||
| self.set_mux_state(appdb_conn, "Ethernet12", ACTIVE) | ||
| self.wait_for_mux_state(dvs, "Ethernet12", ACTIVE) |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another duplicate definition of test_mac_move_vlan appears at lines 166-261. This is the third occurrence of this test method in the file (lines 59-156, 166-261, and 1813-1916). The test should only be defined once in the appropriate test class (TestMuxTunnel). These duplicate definitions at lines 166-261 should be removed.
| MuxCable* ptr = it->second.get(); | ||
| if (ptr->isIpInSubnet(update.entry.ip_address)) | ||
| { | ||
| ptr->updateNeighbor(update.entry, update.add); | ||
| return; | ||
| } | ||
| ptr->updateNeighbor(update.entry, update.add); | ||
| return; | ||
| } |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop through all mux cables now updates only the first mux cable and immediately returns, processing only one mux cable regardless of whether it's the correct one for the given neighbor. This breaks the neighbor update logic when multiple mux cables exist.
Either the loop should be removed entirely (if only one cable needs to be updated), or the correct mux cable should be identified before calling updateNeighbor.
| vlan1000_id = int(self.VLAN_1000_ID) | ||
| vlan2000_id = int(self.VLAN_2000_ID) | ||
|
|
||
| ip_test_data = [ | ||
| ("192.168.0.250", self.IPV4_MASK), | ||
| ("fc02:1000::250", self.IPV6_MASK) |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constants VLAN_1000_ID, VLAN_2000_ID, IPV4_MASK, IPV6_MASK, VLAN_1000, and VLAN_2000 are referenced but not defined in the TestMuxTunnelBase class. These constants need to be added as class attributes for the test to function correctly.
| self.del_fdb(dvs, mac_hyphen, vlan=self.VLAN_2000) | ||
| self.check_tunnel_route_in_app_db(dvs, [ip_addr + mask], expected=False) | ||
|
|
||
| self.set_mux_state(appdb_conn, "Ethernet0", ACTIVE) |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method self.wait_for_mux_state() is called but doesn't appear to be defined. This method needs to be implemented to wait for the mux state to reach the expected value before continuing with the test.
| def test_mac_move_vlan( | ||
| self, dvs, setup, setup_vlan, setup_mux_cable, setup_peer_switch, | ||
| setup_tunnel, setup_mac_move_vlan_test, neighbor_cleanup, testlog | ||
| ): | ||
| """Validate neighbor move behavior between Vlan1000 and Vlan2000 mux ports""" | ||
|
|
||
| def set_mux_state(self, appdb, ifname, state_change): | ||
| appdb_conn = swsscommon.DBConnector(swsscommon.APPL_DB, dvs.redis_sock, 0) | ||
| asicdb = dvs.get_asic_db() | ||
|
|
||
| mac_hyphen = "00-00-00-00-aa-aa" | ||
| mac_colon = mac_hyphen.replace('-', ':') | ||
| vlan1000_id = int(self.VLAN_1000_ID) | ||
| vlan2000_id = int(self.VLAN_2000_ID) | ||
|
|
||
| ip_test_data = [ | ||
| ("192.168.0.250", self.IPV4_MASK), | ||
| ("fc02:1000::250", self.IPV6_MASK) | ||
| ] | ||
|
|
||
| def verify_neighbor(ip_addr, mask, vlan_id, mux_state): | ||
| prefix = ip_addr + mask | ||
| if mux_state == ACTIVE: | ||
| self.check_neigh_in_asic_db(asicdb, ip_addr, vlan_id=vlan_id) | ||
| self.check_tunnel_route_in_app_db(dvs, [prefix], expected=False) | ||
| else: | ||
| self.check_tunnel_route_in_app_db(dvs, [prefix], expected=True) | ||
| for _ in range(10): | ||
| if not self.neigh_exists_in_asic(asicdb, ip_addr, vlan_id): | ||
| break | ||
| time.sleep(0.2) | ||
| else: | ||
| pytest.fail(f"Neighbor {ip_addr} unexpectedly present in ASIC for VLAN {vlan_id}") | ||
|
|
||
| def cleanup_entries(ip_addr, mask): | ||
| self.del_neighbor(dvs, ip_addr, vlan=self.VLAN_1000) | ||
| self.del_neighbor(dvs, ip_addr, vlan=self.VLAN_2000) | ||
| self.del_fdb(dvs, mac_hyphen) | ||
| self.del_fdb(dvs, mac_hyphen, vlan=self.VLAN_2000) | ||
| self.check_tunnel_route_in_app_db(dvs, [ip_addr + mask], expected=False) | ||
|
|
||
| self.set_mux_state(appdb_conn, "Ethernet0", ACTIVE) | ||
| self.wait_for_mux_state(dvs, "Ethernet0", ACTIVE) | ||
| self.set_mux_state(appdb_conn, "Ethernet8", ACTIVE) | ||
| self.wait_for_mux_state(dvs, "Ethernet8", ACTIVE) | ||
|
|
||
| for ip_addr, mask in ip_test_data: | ||
| cleanup_entries(ip_addr, mask) | ||
|
|
||
| for state_eth4, state_eth12 in itertools.product([ACTIVE, STANDBY], repeat=2): | ||
| self.set_mux_state(appdb_conn, "Ethernet4", state_eth4) | ||
| self.wait_for_mux_state(dvs, "Ethernet4", state_eth4) | ||
| self.set_mux_state(appdb_conn, "Ethernet12", state_eth12) | ||
| self.wait_for_mux_state(dvs, "Ethernet12", state_eth12) | ||
|
|
||
| for ip_addr, mask in ip_test_data: | ||
| # Step 2: learn on Ethernet4 / Vlan1000 | ||
| self.add_fdb(dvs, "Ethernet4", mac_hyphen) | ||
| self.add_neighbor(dvs, ip_addr, mac_colon) | ||
| verify_neighbor(ip_addr, mask, vlan1000_id, state_eth4) | ||
|
|
||
| # Step 3: move to Ethernet12 / Vlan2000 | ||
| self.del_fdb(dvs, mac_hyphen) | ||
| self.del_neighbor(dvs, ip_addr, vlan=self.VLAN_1000) | ||
| self.add_fdb(dvs, "Ethernet12", mac_hyphen, vlan=self.VLAN_2000) | ||
| self.add_neighbor(dvs, ip_addr, mac_colon, vlan=self.VLAN_2000) | ||
| verify_neighbor(ip_addr, mask, vlan2000_id, state_eth12) | ||
|
|
||
| # Step 4: move back to Ethernet4 / Vlan1000 | ||
| self.del_fdb(dvs, mac_hyphen, vlan=self.VLAN_2000) | ||
| self.del_neighbor(dvs, ip_addr, vlan=self.VLAN_2000) | ||
| self.add_fdb(dvs, "Ethernet4", mac_hyphen) | ||
| self.add_neighbor(dvs, ip_addr, mac_colon) | ||
| verify_neighbor(ip_addr, mask, vlan1000_id, state_eth4) | ||
|
|
||
| toggle_eth4 = [STANDBY, ACTIVE] if state_eth4 == ACTIVE else [ACTIVE, STANDBY] | ||
| for new_state in toggle_eth4: | ||
| self.set_mux_state(appdb_conn, "Ethernet4", new_state) | ||
| self.wait_for_mux_state(dvs, "Ethernet4", new_state) | ||
| verify_neighbor(ip_addr, mask, vlan1000_id, new_state) | ||
|
|
||
| # Re-learn on Vlan2000 for Ethernet12 toggling | ||
| self.del_fdb(dvs, mac_hyphen) | ||
| self.del_neighbor(dvs, ip_addr, vlan=self.VLAN_1000) | ||
| self.add_fdb(dvs, "Ethernet12", mac_hyphen, vlan=self.VLAN_2000) | ||
| self.add_neighbor(dvs, ip_addr, mac_colon, vlan=self.VLAN_2000) | ||
|
|
||
| toggle_eth12 = [STANDBY, ACTIVE] if state_eth12 == ACTIVE else [ACTIVE, STANDBY] | ||
| for new_state in toggle_eth12: | ||
| self.set_mux_state(appdb_conn, "Ethernet12", new_state) | ||
| self.wait_for_mux_state(dvs, "Ethernet12", new_state) | ||
| verify_neighbor(ip_addr, mask, vlan2000_id, new_state) | ||
|
|
||
| cleanup_entries(ip_addr, mask) | ||
|
|
||
| self.set_mux_state(appdb_conn, "Ethernet4", ACTIVE) | ||
| self.wait_for_mux_state(dvs, "Ethernet4", ACTIVE) | ||
| self.set_mux_state(appdb_conn, "Ethernet12", ACTIVE) | ||
| self.wait_for_mux_state(dvs, "Ethernet12", ACTIVE) |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test method test_mac_move_vlan appears at lines 59-156 within the TestMuxTunnelBase class definition. This appears to be incorrectly placed, as looking at the diff structure, lines 59-156 seem to replace what should be class constant definitions. Additionally, the same test is properly defined later at lines 1813-1916 within the TestMuxTunnel class. This duplicate definition at lines 59-156 should be removed, and the proper class constants should be restored here instead.
| vlan_name = self.VLAN_2000 | ||
| port = self.VLAN_2000_MEMBER | ||
|
|
||
| try: | ||
| config_db.create_entry("VLAN", vlan_name, {"vlanid": self.VLAN_2000_ID}) | ||
| config_db.create_entry("VLAN_MEMBER", f"{vlan_name}|{port}", {"tagging_mode": "untagged"}) | ||
| config_db.create_entry("VLAN_INTERFACE", vlan_name, {"NULL": "NULL"}) | ||
| config_db.create_entry("VLAN_INTERFACE", f"{vlan_name}|{self.VLAN_2000_IPV4}", {"NULL": "NULL"}) | ||
| config_db.create_entry("VLAN_INTERFACE", f"{vlan_name}|{self.VLAN_2000_IPV6}", {"NULL": "NULL"}) |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fixture references self.VLAN_2000, self.VLAN_2000_MEMBER, self.VLAN_2000_ID, self.VLAN_2000_IPV4, and self.VLAN_2000_IPV6 but these constants are not defined in the TestMuxTunnelBase class. These need to be added as class attributes for the fixture to work correctly.
|
|
||
| def check_tunnel_route_in_app_db(self, dvs, destinations, expected=True): | ||
| appdb = dvs.get_app_db() | ||
| def test_mac_move_vlan( |
Copilot
AI
Nov 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable test_mac_move_vlan is not used.
Today we allow neighbor IPs which are outside a mux cable's subnet to be learned on any mux port. However, internal logic prevents successful processing of these IPs in certain scenarios. This removes the check that compares the ip to the Vlan subnet so that it will be processed as a regular mux neighbor.
What I did
Removed subnet checks from muxorch logic and created unit tests to validate the functionality when a neighbor is moving between 2 vlan subnets
Why I did it
mac move between vlan subnets causes dualtor_neighbor_check inconsistency
How I verified it
created a unit test to check functionality
Details if related
ADO: #35945298