Skip to content

CMIS optical xcvrs (> 5.0) Breakout Issues #568

@bobbymcgonigle

Description

@bobbymcgonigle

Problem

On optical transceivers across different vendors on multiple devices, some interfaces do not linkup in breakout mode.

Symptoms

The symptoms below use an 800GBASE-FR4 in 8x100G mode.

  1. Interfaces stay linkdown.
 Ethernet72                               73     100G   9100    N/A  Ethernet10/1  routed      up       up       N/A  QSFP-DD      QDD-800G-2FR4         off        5
 Ethernet73                               74     100G   9100    N/A  Ethernet10/1  routed      up       up       N/A  QSFP-DD      QDD-800G-2FR4         off        5
 Ethernet74                               75     100G   9100    N/A  Ethernet10/1  routed      up       up       N/A  QSFP-DD      QDD-800G-2FR4         off        5
 Ethernet75                               76     100G   9100    N/A  Ethernet10/1  routed    down       up       N/A  QSFP-DD      QDD-800G-2FR4         off        4
 Ethernet76                               77     100G   9100    N/A  Ethernet10/1  routed      up       up       N/A  QSFP-DD      QDD-800G-2FR4         off        5
 Ethernet77                               78     100G   9100    N/A  Ethernet10/1  routed      up       up       N/A  QSFP-DD      QDD-800G-2FR4         off        5
 Ethernet78                               79     100G   9100    N/A  Ethernet10/1  routed    down       up       N/A  QSFP-DD      QDD-800G-2FR4         off        4
 Ethernet79                               80     100G   9100    N/A  Ethernet10/1  routed      up       up       N/A  QSFP-DD      QDD-800G-2FR4         off        5
  1. Datapath on corresponding lanes are deactivated.
  2. Xcvr misprogrammed and CMIS state is FAILED.
Ethernet72:
        CMIS State (SW): FAILED
        Data path state indicator on host lane 1: DataPathActivated
        Data path state indicator on host lane 2: DataPathActivated
        Data path state indicator on host lane 3: DataPathActivated
        Data path state indicator on host lane 4: DataPathDeactivated -> Ethernet75 down
        Data path state indicator on host lane 5: DataPathActivated
        Data path state indicator on host lane 6: DataPathActivated
        Data path state indicator on host lane 7: DataPathDeactivated -> Ethernet78 down
        Data path state indicator on host lane 8: DataPathActivated
        Tx output status on media lane 1: False
        Tx output status on media lane 2: True
        Tx output status on media lane 3: True
        Tx output status on media lane 4: True
        Tx output status on media lane 5: True
        Tx output status on media lane 6: True
        Tx output status on media lane 7: False
        Tx output status on media lane 8: True

Root Cause

                        if hasattr(api, 'get_cmis_rev'):
                            # Check datapath init pending on module that supports CMIS 5.x
                            majorRev = int(api.get_cmis_rev().split('.')[0])
                            if majorRev >= 5 and not self.check_datapath_init_pending(api, host_lanes_mask):
                                self.log_notice("{}: datapath init not pending".format(lport))
                                self.force_cmis_reinit(lport, retries + 1)
                                continue

In theory this is a good check, the problem is that page 11h DPInitPendingLane(i) is transitional state. This means that in software we can miss this state. Missing this state is more likely with more xcvr processing and hwskus that have a large number of interfaces (both of which are increasingly true).

To show that this state is being missed, I added logging to every call in check_datapath_init_pending:

Ethernet78: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet75: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet77: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet72: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet73: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet76: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': True, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Expected
Ethernet79: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': True} -> Expected
Ethernet78: {'DPInitPending1': False, 'DPInitPending2': True, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet75: {'DPInitPending1': False, 'DPInitPending2': True, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet77: {'DPInitPending1': False, 'DPInitPending2': True, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet72: {'DPInitPending1': False, 'DPInitPending2': True, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet73: {'DPInitPending1': False, 'DPInitPending2': True, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Expected
Ethernet78: {'DPInitPending1': True, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet75: {'DPInitPending1': True, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet77: {'DPInitPending1': True, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet72: {'DPInitPending1': True, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Expected
Ethernet78: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': True, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet75: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': True, 'DPInitPending7': False, 'DPInitPending8': False} -> Not expected
Ethernet77: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': False, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': True, 'DPInitPending7': False, 'DPInitPending8': False} -> Expected
Ethernet74: {'DPInitPending1': False, 'DPInitPending2': False, 'DPInitPending3': True, 'DPInitPending4': False, 'DPInitPending5': False, 'DPInitPending6': False, 'DPInitPending7': False, 'DPInitPending8': False} -> Expected

We can see we miss this transitional state in SW.

Proposed Solution

This check was added in (#293) and is needed for ZR. I will look into checking that datapath init is complete, or only doing this check for ZR modules

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions