Set DOM polling interval based on the time at the start of the loop instead of the end by aditya-nexthop · Pull Request #757 · sonic-net/sonic-platform-daemons

aditya-nexthop · 2026-02-20T02:02:21Z

Description

During profiling we found that there was a significant period of time when transceiver information was not being updated.
Upon study, it was seen that for DOM_INFO_UPDATE_PERIOD_SECS after the last transceiver in the loop was polled, there were no updates.

Motivation and Context

fixes #756
This improves DOM polling performance of transceivers in xcvrd and correctly ensures that DOM_INFO_UPDATE_PERIOD_SECS time elapses between updating a specific transceiver.

How Has This Been Tested?

We did a CPU profiling without and with the change over a 10 min window just after xcvrd starts
Before:

After:

The CPU is more uniformly utilized after the change instead of staying idle when more than DOM_INFO_UPDATE_PERIOD_SECS has already passed since a specific transceiver was polled.

Additional Information (Optional)

mssonicbld · 2026-02-20T02:02:28Z

/azp run

azure-pipelines · 2026-02-20T02:02:37Z

Azure Pipelines successfully started running 1 pipeline(s).

mssonicbld · 2026-02-20T02:03:37Z

/azp run

azure-pipelines · 2026-02-20T02:03:46Z

Azure Pipelines successfully started running 1 pipeline(s).

…nstead of the end. Signed-off-by: aditya-nexthop <[email protected]>

mssonicbld · 2026-02-20T02:04:37Z

/azp run

azure-pipelines · 2026-02-20T02:04:46Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Pull request overview

This PR adjusts the xcvrd DOM monitoring loop scheduling so the next DOM polling window is calculated from the timestamp at the start of the loop iteration, instead of from the time after all ports have been processed. This aligns the effective polling period with DOM_INFO_UPDATE_PERIOD_SECS per the expectation in issue #756 and reduces idle gaps after a full polling pass.

Changes:

Capture now at the beginning of each DOM monitoring loop iteration.
Use that loop-start timestamp to determine whether an update is due and to compute next_periodic_db_update_time.

Copilot · 2026-03-10T22:07:50Z

sonic-xcvrd/xcvrd/dom/dom_mgr.py

+            now = datetime.datetime.now()
+            if next_periodic_db_update_time <= now:


The new scheduling logic depends on capturing now at the start of the loop and using it to compute next_periodic_db_update_time. There isn’t currently a unit test that asserts this behavior for a non-zero DOM_INFO_UPDATE_PERIOD_SECS (e.g., by mocking datetime.datetime.now() to simulate a long per-iteration processing time and verifying the next update time is based on the loop-start timestamp, not the loop-end timestamp). Adding such a test would help prevent regressions to the original timing bug.

@aditya-nexthop do you want to add this test?

mihirpat1

@aditya-nexthop @prgeor This change looks correct for fixing the interval anchoring. However, a broader question: if each port takes ~1s to update DOM data and there are 64 ports, the loop takes ~64s per iteration. With DOM_INFO_UPDATE_PERIOD_SECS potentially shorter than that, are we okay with the loop running back-to-back with no sleep/yield? Could there be a concern about starving other tasks or excessive CPU usage in that scenario?

aditya-nexthop · 2026-03-11T17:03:28Z

@aditya-nexthop @prgeor This change looks correct for fixing the interval anchoring. However, a broader question: if each port takes ~1s to update DOM data and there are 64 ports, the loop takes ~64s per iteration. With DOM_INFO_UPDATE_PERIOD_SECS potentially shorter than that, are we okay with the loop running back-to-back with no sleep/yield? Could there be a concern about starving other tasks or excessive CPU usage in that scenario?

Hi @mihirpat1, as per #758 the wait is primarily due to the time waiting for select to timeout.

We don't need to merge this PR if we merge #758 as #758 includes these changes.
I thought it useful to review the effect of this change as well as #758 independently to understand the impact.

I will close this PR once #758 merges.

prgeor · 2026-03-11T23:51:34Z

@aditya-nexthop @prgeor This change looks correct for fixing the interval anchoring. However, a broader question: if each port takes ~1s to update DOM data and there are 64 ports, the loop takes ~64s per iteration. With DOM_INFO_UPDATE_PERIOD_SECS potentially shorter than that, are we okay with the loop running back-to-back with no sleep/yield? Could there be a concern about starving other tasks or excessive CPU usage in that scenario?

@mihirpat1 in that case we may have to define DOM_INFO_UPDATE_PERIOD_SECS per platform basis? As the polling period of default 60 seconds does NOT work universally?

mihirpat1 · 2026-03-12T06:05:31Z

@aditya-nexthop @prgeor This change looks correct for fixing the interval anchoring. However, a broader question: if each port takes ~1s to update DOM data and there are 64 ports, the loop takes ~64s per iteration. With DOM_INFO_UPDATE_PERIOD_SECS potentially shorter than that, are we okay with the loop running back-to-back with no sleep/yield? Could there be a concern about starving other tasks or excessive CPU usage in that scenario?

@mihirpat1 in that case we may have to define DOM_INFO_UPDATE_PERIOD_SECS per platform basis? As the polling period of default 60 seconds does NOT work universally?

@prgeor That's correct - defining it per platform can help in addressing this.

aditya-nexthop force-pushed the aditya-polling-interval-fix branch from b20422c to 1f21007 Compare February 20, 2026 02:03

Set DOM polling interval based on the time at the start of the loop i…

c1e6d5f

…nstead of the end. Signed-off-by: aditya-nexthop <[email protected]>

aditya-nexthop force-pushed the aditya-polling-interval-fix branch from 1f21007 to c1e6d5f Compare February 20, 2026 02:04

aditya-nexthop marked this pull request as ready for review February 20, 2026 02:30

aditya-nexthop mentioned this pull request Feb 24, 2026

Adjust select timeouts during port update handling to allow for faster transceiver DOM polling #758

Open

prgeor requested a review from Copilot March 10, 2026 22:05

Copilot started reviewing on behalf of prgeor March 10, 2026 22:05 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

prgeor requested a review from mihirpat1 March 10, 2026 22:29

prgeor approved these changes Mar 10, 2026

View reviewed changes

mihirpat1 reviewed Mar 10, 2026

View reviewed changes

		now = datetime.datetime.now()
		if next_periodic_db_update_time <= now:

Conversation

aditya-nexthop commented Feb 20, 2026

Description

Motivation and Context

How Has This Been Tested?

Additional Information (Optional)

Uh oh!

mssonicbld commented Feb 20, 2026

Uh oh!

azure-pipelines bot commented Feb 20, 2026

Uh oh!

mssonicbld commented Feb 20, 2026

Uh oh!

azure-pipelines bot commented Feb 20, 2026

Uh oh!

mssonicbld commented Feb 20, 2026

Uh oh!

azure-pipelines bot commented Feb 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

prgeor Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

mihirpat1 left a comment

Choose a reason for hiding this comment

Uh oh!

aditya-nexthop commented Mar 11, 2026

Uh oh!

prgeor commented Mar 11, 2026

Uh oh!

mihirpat1 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants