You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<li><ahref="#configuration-not-allowed-when-port-is-member-of-portchannel">Configuration not allowed when port is member of PortChannel</a></li>
1863
1947
<li><ahref="#external-peering-over-a-connection-originating-from-an-mclag-switch-can-fail"><abbrtitle="Definition of the "external system" to peer with (could be one or multiple devices such as edge/provider routers)">External</abbr> peering over a connection originating from an MCLAG switch can fail</a></li>
1864
1948
<li><ahref="#mesh-limitations-on-th5-based-devices">Mesh limitations on TH5-based devices</a></li>
1949
+
<li><ahref="#breakout-and-cmis-transceiver-initialization-issues-on-ds5000">Breakout and CMIS transceiver initialization issues on DS5000</a></li>
1865
1950
</ul>
1866
1951
<h3id="deleting-a-vpc-and-creating-a-new-one-right-away-can-cause-the-agent-to-fail">Deleting a <abbrtitle="Virtual Private Cloud, similar to the public cloud VPC it provides an isolated private network for the resources with support for multiple subnets each with user-provided VLANs and on-demand DHCP">VPC</abbr> and creating a new one right away can cause the agent to fail</h3>
1867
1952
<p>The issue is due to limitations in SONiC's gNMI interface. In this particular case,
<p>None. We recommend avoiding mesh topologies on TH5-based devices for the
1924
2009
time being, with the exception of 2-node topologies without gateway, where
1925
2010
the above issues would not apply.</p>
2011
+
<h3id="breakout-and-cmis-transceiver-initialization-issues-on-ds5000">Breakout and CMIS transceiver initialization issues on DS5000</h3>
2012
+
<p>On Celestica DS5000 devices, certain transceivers using the Common Management Interface Specification (CMIS) fail to initialize properly under specific conditions.</p>
2013
+
<p>CMIS is an open standard for managing high-speed pluggable transceivers, providing a uniform way for the network operating system to interact with and monitor them.</p>
2014
+
<h4id="diagnosing-the-issue_1">Diagnosing the issue</h4>
2015
+
<p>If you breakout a port (for example, changing from 1x800G to 2x400G or 8x100G) while no transceiver is present, and then insert a transceiver afterward, initialization may fail and the transceiver may be missing or appear as failed in SONiC.</p>
2016
+
<p>This occurs because SONiC did not always correctly reinitialize hardware abstraction for the port after breakout and re-insertion in this scenario, especially affecting CMIS modules.</p>
2017
+
<h4id="resolution">Resolution</h4>
2018
+
<ul>
2019
+
<li>The Hedgehog Fabric agent now automatically patches <code>/usr/share/sonic/platform/pddf/pddf-device.json</code> as needed after NOS installation (the patch is indicated by <code>-hh1</code> in the description). No user action is required to apply this workaround.</li>
2020
+
<li>A full switch reboot is still required after agent deployment for the patch to take effect.</li>
2021
+
<li>The <code>REBOOTREQ</code> column for the agent object in <code>kubectl</code> or <code>k9s</code> will indicate if a reboot is needed.</li>
2022
+
<li>If you encounter existing transceiver failures (such as after an upgrade), a full power cycle of the switch, sometimes referred as cold boot, may still be required in addition to the reboot.</li>
<li><ahref="#configuration-not-allowed-when-port-is-member-of-portchannel">Configuration not allowed when port is member of PortChannel</a></li>
1863
1947
<li><ahref="#external-peering-over-a-connection-originating-from-an-mclag-switch-can-fail"><abbrtitle="Definition of the "external system" to peer with (could be one or multiple devices such as edge/provider routers)">External</abbr> peering over a connection originating from an MCLAG switch can fail</a></li>
1864
1948
<li><ahref="#mesh-limitations-on-th5-based-devices">Mesh limitations on TH5-based devices</a></li>
1949
+
<li><ahref="#breakout-and-cmis-transceiver-initialization-issues-on-ds5000">Breakout and CMIS transceiver initialization issues on DS5000</a></li>
1865
1950
</ul>
1866
1951
<h3id="deleting-a-vpc-and-creating-a-new-one-right-away-can-cause-the-agent-to-fail">Deleting a <abbrtitle="Virtual Private Cloud, similar to the public cloud VPC it provides an isolated private network for the resources with support for multiple subnets each with user-provided VLANs and on-demand DHCP">VPC</abbr> and creating a new one right away can cause the agent to fail</h3>
1867
1952
<p>The issue is due to limitations in SONiC's gNMI interface. In this particular case,
<p>None. We recommend avoiding mesh topologies on TH5-based devices for the
1924
2009
time being, with the exception of 2-node topologies without gateway, where
1925
2010
the above issues would not apply.</p>
2011
+
<h3id="breakout-and-cmis-transceiver-initialization-issues-on-ds5000">Breakout and CMIS transceiver initialization issues on DS5000</h3>
2012
+
<p>On Celestica DS5000 devices, certain transceivers using the Common Management Interface Specification (CMIS) fail to initialize properly under specific conditions.</p>
2013
+
<p>CMIS is an open standard for managing high-speed pluggable transceivers, providing a uniform way for the network operating system to interact with and monitor them.</p>
2014
+
<h4id="diagnosing-the-issue_1">Diagnosing the issue</h4>
2015
+
<p>If you breakout a port (for example, changing from 1x800G to 2x400G or 8x100G) while no transceiver is present, and then insert a transceiver afterward, initialization may fail and the transceiver may be missing or appear as failed in SONiC.</p>
2016
+
<p>This occurs because SONiC did not always correctly reinitialize hardware abstraction for the port after breakout and re-insertion in this scenario, especially affecting CMIS modules.</p>
2017
+
<h4id="resolution">Resolution</h4>
2018
+
<ul>
2019
+
<li>The Hedgehog Fabric agent now automatically patches <code>/usr/share/sonic/platform/pddf/pddf-device.json</code> as needed after NOS installation (the patch is indicated by <code>-hh1</code> in the description). No user action is required to apply this workaround.</li>
2020
+
<li>A full switch reboot is still required after agent deployment for the patch to take effect.</li>
2021
+
<li>The <code>REBOOTREQ</code> column for the agent object in <code>kubectl</code> or <code>k9s</code> will indicate if a reboot is needed.</li>
2022
+
<li>If you encounter existing transceiver failures (such as after an upgrade), a full power cycle of the switch, sometimes referred as cold boot, may still be required in addition to the reboot.</li>
0 commit comments