-
Notifications
You must be signed in to change notification settings - Fork 195
Delay DOM polling until all ports are initialized #614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@moshemos @dgsudharsan for awareness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR delays DOM polling until all ports finish initialization, ensuring that the CMIS datapath state machine is complete before expensive polling begins.
- Introduces a new function wait_port_initialization that loops until all logical ports are either initialized or removed from consideration.
- Calls the new waiting function in the task_worker() method to delay DOM monitoring until all ports reach a terminal CMIS state.
sonic-xcvrd/xcvrd/dom/dom_mgr.py
Outdated
| dom_info_update_periodic_secs = self.DOM_INFO_UPDATE_PERIOD_SECS | ||
|
|
||
| # Wait for all PORTs to be initialized | ||
| self.wait_port_initialization(dom_info_update_periodic_secs) |
Copilot
AI
May 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider implementing a timeout mechanism in wait_port_initialization() to prevent the possibility of an infinite wait if a port never reaches a terminal state.
| self.wait_port_initialization(dom_info_update_periodic_secs) | |
| self.wait_port_initialization(dom_info_update_periodic_secs, timeout=300) # Timeout set to 5 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor Can you please help in fixing the build failure?
sonic-xcvrd/xcvrd/dom/dom_mgr.py
Outdated
| continue | ||
|
|
||
| physical_port = physical_port_list[0] | ||
| if not xcvrd._wrapper_get_presence(physical_port): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor Can we instead call xcvrd_utils.get_transceiver_presence()?
Also, can we check if self.task_stopping_event.is_set()?
Wondering if you want to use _validate_and_get_physical_port instead?
sonic-xcvrd/xcvrd/dom/dom_mgr.py
Outdated
|
|
||
| # Adding dom_info_update_periodic_secs to allow xcvrd to initialize ports | ||
| # before starting the periodic update | ||
| next_periodic_db_update_time = datetime.datetime.now() + datetime.timedelta(seconds=dom_info_update_periodic_secs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor Wondering if we should set is_periodic_db_update_needed to True since all the ports would have been in CMIS terminal state by this point?
sonic-xcvrd/xcvrd/dom/dom_mgr.py
Outdated
| def wait_port_initialization(self, delay): | ||
| logical_port_set = set(self.port_mapping.logical_port_list) | ||
|
|
||
| while logical_port_set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor Can we add an upper bound timeout here to ensure that we don't end up in infinite loop (ideally, this should not occur)?
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR delays DOM polling until all ports are initialized, aiming to reduce contention during port initialization.
- Introduces a waiting function to poll for port initialization using defined timeout and polling constants.
- Updates task_worker to log an error if ports remain uninitialized past the timeout, or to start DOM monitoring once all ports are ready.
sonic-xcvrd/xcvrd/dom/dom_mgr.py
Outdated
| if datetime.datetime.now() > dom_wait_time_end: | ||
| break | ||
|
|
||
| return logical_port_set |
Copilot
AI
May 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return statement is placed inside the while loop, causing an early exit after the first iteration. Consider moving the return statement outside the loop to allow full polling until either a timeout occurs or all ports are initialized.
| return logical_port_set | |
| return logical_port_set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor Can you please address this?
mihirpat1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor Can you please fix the build failure?
| self.log_notice("Stop event generated during DOM monitoring loop") | ||
| break | ||
|
|
||
| if not is_periodic_db_update_needed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor Why is this removed?
|
|
||
| # Start loop to update dom info in DB periodically and handle port change events | ||
| while not self.task_stopping_event.is_set(): | ||
| # Check if periodic db update is needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor Why is this removed?
| else: | ||
| self.log_notice("All ports are in CMIS terminal state, start DOM monitoring") | ||
|
|
||
| # Start loop to update dom info in DB periodically and handle port change events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor Please add is_periodic_db_update_needed = True to ensure that DOM data is updated periodically
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Prince George <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
Delay DOM polling until all ports are initialized
Motivation and Context
Platforms with large radix like 512 ports may take more time to initialize and complete the CMIS datapath state machine. Since DOM polling is quite expensive on IO bound operation, it can result in contention with CMIS manager task which is initializing the port
How Has This Been Tested?
Tested this on a platform with 512 100G ports with 800G DR8 optics and see a reduction of overall link up time by around 4mins.
Additional Information (Optional)