-
Notifications
You must be signed in to change notification settings - Fork 195
Description
Problem
On optical transceivers across different vendors on multiple devices, some interfaces do not linkup in breakout mode.
Symptoms
The symptoms below use an 800GBASE-FR4 in 8x100G mode.
- Interfaces stay linkdown.
Ethernet72 73 100G 9100 N/A Ethernet10/1 routed up up N/A QSFP-DD QDD-800G-2FR4 off 5
Ethernet73 74 100G 9100 N/A Ethernet10/1 routed up up N/A QSFP-DD QDD-800G-2FR4 off 5
Ethernet74 75 100G 9100 N/A Ethernet10/1 routed up up N/A QSFP-DD QDD-800G-2FR4 off 5
Ethernet75 76 100G 9100 N/A Ethernet10/1 routed down up N/A QSFP-DD QDD-800G-2FR4 off 4
Ethernet76 77 100G 9100 N/A Ethernet10/1 routed up up N/A QSFP-DD QDD-800G-2FR4 off 5
Ethernet77 78 100G 9100 N/A Ethernet10/1 routed up up N/A QSFP-DD QDD-800G-2FR4 off 5
Ethernet78 79 100G 9100 N/A Ethernet10/1 routed down up N/A QSFP-DD QDD-800G-2FR4 off 4
Ethernet79 80 100G 9100 N/A Ethernet10/1 routed up up N/A QSFP-DD QDD-800G-2FR4 off 5
- Datapath on corresponding lanes are deactivated.
- Xcvr misprogrammed and CMIS state is FAILED.
Ethernet72:
CMIS State (SW): FAILED
Data path state indicator on host lane 1: DataPathActivated
Data path state indicator on host lane 2: DataPathActivated
Data path state indicator on host lane 3: DataPathActivated
Data path state indicator on host lane 4: DataPathDeactivated -> Ethernet75 down
Data path state indicator on host lane 5: DataPathActivated
Data path state indicator on host lane 6: DataPathActivated
Data path state indicator on host lane 7: DataPathDeactivated -> Ethernet78 down
Data path state indicator on host lane 8: DataPathActivated
Tx output status on media lane 1: False
Tx output status on media lane 2: True
Tx output status on media lane 3: True
Tx output status on media lane 4: True
Tx output status on media lane 5: True
Tx output status on media lane 6: True
Tx output status on media lane 7: False
Tx output status on media lane 8: True
Root Cause
if hasattr(api, 'get_cmis_rev'):
# Check datapath init pending on module that supports CMIS 5.x
majorRev = int(api.get_cmis_rev().split('.')[0])
if majorRev >= 5 and not self.check_datapath_init_pending(api, host_lanes_mask):
self.log_notice("{}: datapath init not pending".format(lport))
self.force_cmis_reinit(lport, retries + 1)
continue
In theory this is a good check, the problem is that page 11h DPInitPendingLane(i) is transitional state. This means that in software we can miss this state. Missing this state is more likely with more xcvr processing and hwskus that have a large number of interfaces (both of which are increasingly true).
To show that this state is being missed, I added logging to every call in check_datapath_init_pending:
Ethernet78: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet75: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet77: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet72: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet73: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet76: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Expected
Ethernet79: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': True} -> Expected
Ethernet78: {'DPInitPending1': False, 'DPInitPending2': True, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet75: {'DPInitPending1': False, 'DPInitPending2': True, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet77: {'DPInitPending1': False, 'DPInitPending2': True, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet72: {'DPInitPending1': False, 'DPInitPending2': True, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet73: {'DPInitPending1': False, 'DPInitPending2': True, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Expected
Ethernet78: {'DPInitPending1': True, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet75: {'DPInitPending1': True, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet77: {'DPInitPending1': True, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet72: {'DPInitPending1': True, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Expected
Ethernet78: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': True, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet75: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': True, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet77: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': True, 'DPInitPending7': False, 'DPInitPending8': False} -> Expected
Ethernet74: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': True, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Expected
We can see we miss this transitional state in SW.
Proposed Solution
This check was added in (#293) and is needed for ZR. I will look into checking that datapath init is complete, or only doing this check for ZR modules