Skip to content

Conversation

@jianyuewu
Copy link
Contributor

This commit fixes a 10-second startup delay during fast-reboot in dynamic buffer mode.

What I did
Add check for fast-reboot done flag m_bufferCompletelyInitialized in checkSharedBufferPoolSize() to avoid early calculation. This can save about 7s.
But still keep one calculation, because buffer pool need to be ready before buffer profile creation.
Also skip headroom validate in startup phase. This can save about 2s.

Why I did it
There is a 10-second startup delay during fast-reboot.
Because in fast-reboot, m_mmuSize is immediately available from STATE_DB (persisted from previous boot), causing checkSharedBufferPoolSize() to execute expensive Redis operations before buffer system init completes.

How I verified it
Test with fast reboot, and check timing duration between key logs:
Before fix:
2025 Jun 6 13:53:03.918106 sonic NOTICE swss#orchagent: :- main: Created underlay router interface ID 600000000000a
2025 Jun 6 13:53:14.411721 sonic NOTICE swss#orchagent: :- initBufferConstants: Got maximum memory size 20799, exposing to BUFFER_MAX_PARAM_TABLE|global
After fix:
2025 Sep 15 12:12:55.319951 r-bison-12 NOTICE swss#orchagent: :- main: Created underlay router interface ID 600000000000a
2025 Sep 15 12:12:55.617816 r-bison-12 NOTICE swss#orchagent: :- initBufferConstants: Got maximum memory size 20799, exposing to BUFFER_MAX_PARAM_TABLE|global

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lguohan lguohan requested a review from Copilot October 24, 2025 17:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the fast-reboot startup process by eliminating a 10-second delay in dynamic buffer mode. The optimization targets two expensive operations that previously executed prematurely during fast-reboot when MMU size was immediately available from STATE_DB.

Key Changes:

  • Modified checkSharedBufferPoolSize() to conditionally execute based on warm-start state and buffer initialization status, saving ~7 seconds
  • Added early return in isHeadroomResourceValid() to skip validation during warm-start initialization, saving ~2 seconds
  • Added comprehensive unit tests to verify the new conditional logic for both warm-start and non-warm-start scenarios

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
cfgmgr/buffermgrdyn.cpp Added conditional logic to delay expensive buffer pool calculations and headroom validation during warm-start until initialization completes
tests/mock_tests/buffermgrdyn_ut.cpp Added three new test cases covering execution logic for checkSharedBufferPoolSize and isHeadroomResourceValid in both warm-start and non-warm-start scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu jianyuewu force-pushed the fast_reboot_optimize branch from 8367597 to 9062dce Compare October 27, 2025 08:13
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@volodymyrsamotiy
Copy link
Collaborator

@kperumalbfn , @vaibhavhd , could you please help to review?

@jianyuewu jianyuewu force-pushed the fast_reboot_optimize branch from 9062dce to cf90feb Compare October 28, 2025 02:33
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu jianyuewu force-pushed the fast_reboot_optimize branch from 3652df8 to adcab64 Compare October 28, 2025 06:23
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu
Copy link
Contributor Author

@kperumalbfn @prsunny, when you have time, could you help kindly review this PR? Thanks in advance.

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu
Copy link
Contributor Author

Seems test result not related with current change:

        """
    
        def access_function():
            keys = self.get_keys(table_name)
            if wait_at_least_n_keys:
                return (len(keys) >= num_keys, keys)
            else:
                return (len(keys) == num_keys, keys)
    
        status, result = wait_for_result(
            access_function, self._disable_strict_polling(polling_config)
        )
    
        if not status:
            message = failure_message or (
                f"Unexpected number of keys: expected={num_keys}, received={len(result)} "
                f'({result}), table="{table_name}"'
            )
>           assert not polling_config.strict, message
E           AssertionError: Unexpected number of keys: expected=33, received=32 (('Ethernet0', 'Ethernet68', 'Ethernet52', 'Ethernet12', 'Ethernet40', 'Ethernet56', 'Ethernet80', 'PortInitDone', 'Ethernet20', 'Ethernet16', 'Ethernet4', 'Ethernet100', 'Ethernet120', 'Ethernet92', 'Ethernet44', 'Ethernet24', 'Ethernet88', 'Ethernet108', 'Ethernet48', 'Ethernet76', 'Ethernet104', 'Ethernet64', 'Ethernet84', 'Ethernet8', 'Ethernet28', 'Ethernet36', 'Ethernet96', 'Ethernet116', 'PortConfigDone', 'Ethernet32', 'Ethernet112', 'Ethernet60')), table="PORT_TABLE"

dvslib/dvs_database.py:424: AssertionError

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

This commit fixes a 10-second startup delay during fast-reboot in
dynamic buffer mode.

In fast-reboot, m_mmuSize is immediately available from STATE_DB
(persisted from previous boot), causing checkSharedBufferPoolSize()
to execute expensive Redis operations before buffer system
init completes.
Fix is add check for fast-reboot done flag m_bufferCompletelyInitialized
in checkSharedBufferPoolSize() to avoid early calculation. This can save
about 7s.
But still keep one calculation, because buffer pool need to be ready
before buffer profile creation.
Also skip headroom validate in startup phase. This can save about 2s.

Signed-off-by: Jianyue Wu <[email protected]>
Signed-off-by: Jianyue Wu <[email protected]>
Signed-off-by: Jianyue Wu <[email protected]>
@jianyuewu jianyuewu force-pushed the fast_reboot_optimize branch from cb84724 to 78b0f1c Compare November 13, 2025 05:08
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jianyuewu
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants