-
Notifications
You must be signed in to change notification settings - Fork 648
[buffermgrd] Optimize fast-reboot startup #3952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the fast-reboot startup process by eliminating a 10-second delay in dynamic buffer mode. The optimization targets two expensive operations that previously executed prematurely during fast-reboot when MMU size was immediately available from STATE_DB.
Key Changes:
- Modified
checkSharedBufferPoolSize()to conditionally execute based on warm-start state and buffer initialization status, saving ~7 seconds - Added early return in
isHeadroomResourceValid()to skip validation during warm-start initialization, saving ~2 seconds - Added comprehensive unit tests to verify the new conditional logic for both warm-start and non-warm-start scenarios
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| cfgmgr/buffermgrdyn.cpp | Added conditional logic to delay expensive buffer pool calculations and headroom validation during warm-start until initialization completes |
| tests/mock_tests/buffermgrdyn_ut.cpp | Added three new test cases covering execution logic for checkSharedBufferPoolSize and isHeadroomResourceValid in both warm-start and non-warm-start scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
8367597 to
9062dce
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@kperumalbfn , @vaibhavhd , could you please help to review? |
9062dce to
cf90feb
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
3652df8 to
adcab64
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@kperumalbfn @prsunny, when you have time, could you help kindly review this PR? Thanks in advance. |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Seems test result not related with current change: """
def access_function():
keys = self.get_keys(table_name)
if wait_at_least_n_keys:
return (len(keys) >= num_keys, keys)
else:
return (len(keys) == num_keys, keys)
status, result = wait_for_result(
access_function, self._disable_strict_polling(polling_config)
)
if not status:
message = failure_message or (
f"Unexpected number of keys: expected={num_keys}, received={len(result)} "
f'({result}), table="{table_name}"'
)
> assert not polling_config.strict, message
E AssertionError: Unexpected number of keys: expected=33, received=32 (('Ethernet0', 'Ethernet68', 'Ethernet52', 'Ethernet12', 'Ethernet40', 'Ethernet56', 'Ethernet80', 'PortInitDone', 'Ethernet20', 'Ethernet16', 'Ethernet4', 'Ethernet100', 'Ethernet120', 'Ethernet92', 'Ethernet44', 'Ethernet24', 'Ethernet88', 'Ethernet108', 'Ethernet48', 'Ethernet76', 'Ethernet104', 'Ethernet64', 'Ethernet84', 'Ethernet8', 'Ethernet28', 'Ethernet36', 'Ethernet96', 'Ethernet116', 'PortConfigDone', 'Ethernet32', 'Ethernet112', 'Ethernet60')), table="PORT_TABLE"
dvslib/dvs_database.py:424: AssertionError |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This commit fixes a 10-second startup delay during fast-reboot in dynamic buffer mode. In fast-reboot, m_mmuSize is immediately available from STATE_DB (persisted from previous boot), causing checkSharedBufferPoolSize() to execute expensive Redis operations before buffer system init completes. Fix is add check for fast-reboot done flag m_bufferCompletelyInitialized in checkSharedBufferPoolSize() to avoid early calculation. This can save about 7s. But still keep one calculation, because buffer pool need to be ready before buffer profile creation. Also skip headroom validate in startup phase. This can save about 2s. Signed-off-by: Jianyue Wu <[email protected]>
Signed-off-by: Jianyue Wu <[email protected]>
Signed-off-by: Jianyue Wu <[email protected]>
Signed-off-by: Jianyue Wu <[email protected]>
Signed-off-by: Jianyue Wu <[email protected]>
cb84724 to
78b0f1c
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azpw run |
|
/AzurePipelines run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This commit fixes a 10-second startup delay during fast-reboot in dynamic buffer mode.
What I did
Add check for fast-reboot done flag m_bufferCompletelyInitialized in checkSharedBufferPoolSize() to avoid early calculation. This can save about 7s.
But still keep one calculation, because buffer pool need to be ready before buffer profile creation.
Also skip headroom validate in startup phase. This can save about 2s.
Why I did it
There is a 10-second startup delay during fast-reboot.
Because in fast-reboot, m_mmuSize is immediately available from STATE_DB (persisted from previous boot), causing checkSharedBufferPoolSize() to execute expensive Redis operations before buffer system init completes.
How I verified it
Test with fast reboot, and check timing duration between key logs:
Before fix:
2025 Jun 6 13:53:03.918106 sonic NOTICE swss#orchagent: :- main: Created underlay router interface ID 600000000000a
2025 Jun 6 13:53:14.411721 sonic NOTICE swss#orchagent: :- initBufferConstants: Got maximum memory size 20799, exposing to BUFFER_MAX_PARAM_TABLE|global
After fix:
2025 Sep 15 12:12:55.319951 r-bison-12 NOTICE swss#orchagent: :- main: Created underlay router interface ID 600000000000a
2025 Sep 15 12:12:55.617816 r-bison-12 NOTICE swss#orchagent: :- initBufferConstants: Got maximum memory size 20799, exposing to BUFFER_MAX_PARAM_TABLE|global