Set DOM polling interval based on the time at the start of the loop instead of the end#757
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
b20422c to
1f21007
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…nstead of the end. Signed-off-by: aditya-nexthop <[email protected]>
1f21007 to
c1e6d5f
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR adjusts the xcvrd DOM monitoring loop scheduling so the next DOM polling window is calculated from the timestamp at the start of the loop iteration, instead of from the time after all ports have been processed. This aligns the effective polling period with DOM_INFO_UPDATE_PERIOD_SECS per the expectation in issue #756 and reduces idle gaps after a full polling pass.
Changes:
- Capture
nowat the beginning of each DOM monitoring loop iteration. - Use that loop-start timestamp to determine whether an update is due and to compute
next_periodic_db_update_time.
| now = datetime.datetime.now() | ||
| if next_periodic_db_update_time <= now: |
There was a problem hiding this comment.
The new scheduling logic depends on capturing now at the start of the loop and using it to compute next_periodic_db_update_time. There isn’t currently a unit test that asserts this behavior for a non-zero DOM_INFO_UPDATE_PERIOD_SECS (e.g., by mocking datetime.datetime.now() to simulate a long per-iteration processing time and verifying the next update time is based on the loop-start timestamp, not the loop-end timestamp). Adding such a test would help prevent regressions to the original timing bug.
mihirpat1
left a comment
There was a problem hiding this comment.
@aditya-nexthop @prgeor This change looks correct for fixing the interval anchoring. However, a broader question: if each port takes ~1s to update DOM data and there are 64 ports, the loop takes ~64s per iteration. With DOM_INFO_UPDATE_PERIOD_SECS potentially shorter than that, are we okay with the loop running back-to-back with no sleep/yield? Could there be a concern about starving other tasks or excessive CPU usage in that scenario?
Hi @mihirpat1, as per #758 the wait is primarily due to the time waiting for select to timeout. We don't need to merge this PR if we merge #758 as #758 includes these changes. I will close this PR once #758 merges. |
@mihirpat1 in that case we may have to define |
@prgeor That's correct - defining it per platform can help in addressing this. |
Description
During profiling we found that there was a significant period of time when transceiver information was not being updated.
Upon study, it was seen that for DOM_INFO_UPDATE_PERIOD_SECS after the last transceiver in the loop was polled, there were no updates.
Motivation and Context
fixes #756
This improves DOM polling performance of transceivers in
xcvrdand correctly ensures that DOM_INFO_UPDATE_PERIOD_SECS time elapses between updating a specific transceiver.How Has This Been Tested?
We did a CPU profiling without and with the change over a 10 min window just after


xcvrdstartsBefore:
After:
The CPU is more uniformly utilized after the change instead of staying idle when more than DOM_INFO_UPDATE_PERIOD_SECS has already passed since a specific transceiver was polled.
Additional Information (Optional)