Increase Orch CPU utilization timeout before link flap #16187

arista-hpandya · 2024-12-20T18:57:26Z

This change was made because in modular chassis with multi-asic LCs, the link flap test might run on the uplink LC followed by the downlink LC. Since the uplink has a lot of neighbors the downlink CPU is busy re-routing the different pathways. In such a scenario, the downlink LC will still be hot (above 10% utilization) before we flap its interfaces. Hence, the increase in timeout.

We tested it with a timeout of 500 and it failed so we are increasing it to 600 which has been passing on our local T2 testbeds.

Description of PR

Summary:
Fixes #16186

Type of change

Bug fix
Testbed and Framework(new/improvement)
Test case(new/improvement)

Back port request

Approach

What is the motivation for this PR?

To make sure that the timeout for the Orchagent CPU utilization check is large enough for the test to pass.

How did you do it?

Increased the timeout from 100 to 600.

How did you verify/test it?

Ran the test on T2 testbed with a timeout of 600 (Passed) and 500 (Failed)

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

This change was made because in modular chassis with multi-asic LCs, the link flap test might run on the uplink LC followed by the downlink LC. In such a scenario, the downlink LC will still be hot (above 10% utilization) before we flap its interfaces. Hence, the increase in timeout. We tested it with a timeout of 500 and it failed so we are increasing it to 600 which has been passing on our local T2 testbeds.

mssonicbld · 2024-12-20T18:57:28Z

/azp run

azure-pipelines · 2024-12-20T18:57:40Z

Azure Pipelines successfully started running 1 pipeline(s).

wenyiz2021 · 2024-12-23T19:32:19Z

@arista-hpandya could you redefine the timeout in continuous link flap for T2?
basically leave the timeout as 100sec for T0 and T1, we don't want to increase for t0/t1.
for T2 we can increase to 500sec

mssonicbld · 2025-01-02T19:13:29Z

/azp run

azure-pipelines · 2025-01-02T19:13:41Z

Azure Pipelines successfully started running 1 pipeline(s).

arista-hpandya · 2025-01-02T19:15:26Z

@arista-hpandya could you redefine the timeout in continuous link flap for T2? basically leave the timeout as 100sec for T0 and T1, we don't want to increase for t0/t1. for T2 we can increase to 500sec

Hi @wenyiz2021 ! Thanks for reviewing this. I have made the changes to increase the timeout only for T2 devices. Also, on a side note happy new year!

liamkearney-msft

small comment, otherwise lgtm

tests/platform_tests/link_flap/test_cont_link_flap.py

arlakshm · 2025-01-04T01:14:19Z

/Azp run Azure.sonic-mgmt

azure-pipelines · 2025-01-04T01:14:31Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2025-01-08T18:41:58Z

/azp run

azure-pipelines · 2025-01-08T18:42:10Z

Azure Pipelines successfully started running 1 pipeline(s).

arista-hpandya · 2025-01-08T23:29:59Z

/azpw run Azure.sonic-mgmt

mssonicbld · 2025-01-08T23:30:02Z

/AzurePipelines run Azure.sonic-mgmt

azure-pipelines · 2025-01-08T23:30:14Z

Azure Pipelines successfully started running 1 pipeline(s).

arlakshm · 2025-01-11T06:44:07Z

/AzurePipelines run Azure.sonic-mgmt

azure-pipelines · 2025-01-11T06:44:19Z

Azure Pipelines successfully started running 1 pipeline(s).

arista-hpandya requested a review from prgeor as a code owner December 20, 2024 18:57

wenyiz2021 added the chassis label Dec 23, 2024

wenyiz2021 assigned arista-hpandya Dec 23, 2024

wenyiz2021 added Request for 202405 branch Request for 202411 branch labels Dec 23, 2024

Pre-flap timeout is increased only for T2 devices

75660e1

rlhui requested a review from liamkearney-msft January 3, 2025 04:56

liamkearney-msft approved these changes Jan 3, 2025

View reviewed changes

tests/platform_tests/link_flap/test_cont_link_flap.py Outdated Show resolved Hide resolved

Modified comment to explain the increased timeout in T2

8dad5c8

arlakshm approved these changes Jan 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase Orch CPU utilization timeout before link flap #16187

Increase Orch CPU utilization timeout before link flap #16187

arista-hpandya commented Dec 20, 2024 •

edited

Loading

mssonicbld commented Dec 20, 2024

azure-pipelines bot commented Dec 20, 2024

wenyiz2021 commented Dec 23, 2024

mssonicbld commented Jan 2, 2025

azure-pipelines bot commented Jan 2, 2025

arista-hpandya commented Jan 2, 2025

liamkearney-msft left a comment

arlakshm commented Jan 4, 2025

azure-pipelines bot commented Jan 4, 2025

mssonicbld commented Jan 8, 2025

azure-pipelines bot commented Jan 8, 2025

arista-hpandya commented Jan 8, 2025

mssonicbld commented Jan 8, 2025

azure-pipelines bot commented Jan 8, 2025

arlakshm commented Jan 11, 2025

azure-pipelines bot commented Jan 11, 2025

Increase Orch CPU utilization timeout before link flap #16187

Are you sure you want to change the base?

Increase Orch CPU utilization timeout before link flap #16187

Conversation

arista-hpandya commented Dec 20, 2024 • edited Loading

Description of PR

Type of change

Back port request

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

mssonicbld commented Dec 20, 2024

azure-pipelines bot commented Dec 20, 2024

wenyiz2021 commented Dec 23, 2024

mssonicbld commented Jan 2, 2025

azure-pipelines bot commented Jan 2, 2025

arista-hpandya commented Jan 2, 2025

liamkearney-msft left a comment

Choose a reason for hiding this comment

arlakshm commented Jan 4, 2025

azure-pipelines bot commented Jan 4, 2025

mssonicbld commented Jan 8, 2025

azure-pipelines bot commented Jan 8, 2025

arista-hpandya commented Jan 8, 2025

mssonicbld commented Jan 8, 2025

azure-pipelines bot commented Jan 8, 2025

arlakshm commented Jan 11, 2025

azure-pipelines bot commented Jan 11, 2025

arista-hpandya commented Dec 20, 2024 •

edited

Loading