Skip to content

Conversation

@arista-hpandya
Copy link
Contributor

@arista-hpandya arista-hpandya commented Oct 15, 2025

What I did

Fixes sonic-net/sonic-buildimage#24268

Added a guard clause in PortsOrch::initializePriorityGroupsBulk to check if the port has any configured Priority Groups (port.m_priority_group_ids.size() == 0). If the size is zero, the function now skips the SAI calls to retrieve the Priority Group list.

Why I did it

This fixes a critical regression introduced by a previous PR discussed in detail here: sonic-net/sonic-buildimage#24268 that replaced the individual initializePriorityGroups call with a bulk operation.

The original function included a check to ensure SAI_PORT_ATTR_NUMBER_OF_INGRESS_PRIORITY_GROUPS was non-zero before attempting to retrieve the list. This check was missing in the new initializePriorityGroupsBulk method.

Impact of Regression:
When processing ports without Priority Groups (e.g., recirculation ports like Ethernet-Rec0), the missing check led to a failed SAI operation (SAI_STATUS_NOT_EXECUTED) during initialization, causing a std::runtime_error and an immediate OrchAgent crash, which subsequently brings down the entire SWSS and Syncd stack.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bingwang-ms
Copy link
Contributor

@arista-hpandya Can you fix the code coverage issue?

@rlhui rlhui requested a review from vmittal-msft October 22, 2025 18:03
@rlhui rlhui added the P0 label Oct 22, 2025
arlakshm
arlakshm previously approved these changes Oct 28, 2025
@arlakshm
Copy link
Contributor

@arista-hpandya, can you please add UT to handle this case

@abdosi
Copy link
Contributor

abdosi commented Oct 29, 2025

@arista-hpandya Please address UT

@mssonicbld
Copy link
Collaborator

/azp run

@arista-hpandya
Copy link
Contributor Author

Sure will add a UT soon

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arista-hpandya arista-hpandya force-pushed the fix-portsorch-init-crash branch from e85c87a to c7b3b59 Compare November 4, 2025 21:07
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arista-hpandya arista-hpandya force-pushed the fix-portsorch-init-crash branch from c7b3b59 to 921cafc Compare November 4, 2025 21:21
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arista-hpandya
Copy link
Contributor Author

/azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arista-hpandya
Copy link
Contributor Author

/azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arista-hpandya
Copy link
Contributor Author

/azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vmittal-msft
Copy link
Contributor

@arista-hpandya please rebase

@arista-hpandya arista-hpandya force-pushed the fix-portsorch-init-crash branch from f0bd9d3 to d8843b7 Compare November 10, 2025 22:57
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@arista-hpandya arista-hpandya force-pushed the fix-portsorch-init-crash branch from d8843b7 to 55c73f4 Compare November 12, 2025 18:33
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Pavan-Nokia
Copy link
Contributor

@arista-hpandya #3966 seems to be an enhancement to this PR that addresses additional use cases.
@abdosi @arlakshm @vmittal-msft
@stepanblyschak

@stepanblyschak
Copy link
Contributor

@arista-hpandya @Pavan-Nokia Thanks for raising this PR and letting me know.
Right, In addition to PGs, I think the same issue is with queue and sched. groups.

Also, please have a look at changes I did in my PR in the UT regarding PortHostIfCreateFailed test.
I found this negative test case makes rest of the test cases hang for long time and cause timeouts ( I see here builds are canceled probably because of that ).

@prsunny
Copy link
Collaborator

prsunny commented Nov 14, 2025

@arista-hpandya , can we close this in favor of #3966

@arista-hpandya
Copy link
Contributor Author

Thanks for the PR @stepanblyschak . Yes we can close this PR @prsunny

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Regression: Orchagent Crash on Port Initialization Due to Recirc Port Handling