Skip to content

[orchagent] implement ring buffer feature with a flag #3242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Feb 13, 2025

Conversation

a114j0y
Copy link
Contributor

@a114j0y a114j0y commented Jul 22, 2024

What I did

  • add a ring thread for orchdaemon, which would be kicked off if gRingMode is turned on
  • support ring buffer feature, currently only enabled for route table executor, which has a scaled use case
  • fix the covariant return type issue of swss::TableBase* Consumer::getConsumerTable() const override
    • it should return swss::ConsumerTableBase *

Why I did it

  • increase the speed for APP_ROUTE_TABLE consumers doing tasks

How I verified it
measure the performance with PerformanceTimer GitHub issue/pull request detail

@a114j0y a114j0y requested a review from prsunny as a code owner July 22, 2024 21:52
@a114j0y a114j0y force-pushed the master branch 2 times, most recently from 7f23081 to 8062ead Compare July 25, 2024 21:34
@a114j0y a114j0y force-pushed the master branch 9 times, most recently from 3d3cde6 to ba7f1a0 Compare August 6, 2024 01:17
@siqbal1986
Copy link
Contributor

can you please add sone swss tests for this functionality. There are no tests mentioned in the PR. if you have created a separate PR , Please link it here.

@siqbal1986 siqbal1986 self-requested a review August 23, 2024 18:23
Copy link
Contributor

@siqbal1986 siqbal1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is hte need of rign buffer. Can you please add some detail regarding its need.

@a114j0y a114j0y force-pushed the master branch 2 times, most recently from 80e7a90 to a6b3b0c Compare October 2, 2024 01:17
@a114j0y a114j0y force-pushed the master branch 6 times, most recently from 2ccb3dc to 12b402a Compare October 23, 2024 20:56
@a114j0y a114j0y requested a review from siqbal1986 October 23, 2024 21:00
@a114j0y
Copy link
Contributor Author

a114j0y commented Oct 30, 2024

rebased and conflict resolved @siqbal1986 could you help re-approve it? : )

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yuezhoujk
Copy link

/azp run Azure.sonic-swss

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prsunny prsunny merged commit 900f38c into sonic-net:master Feb 13, 2025
15 checks passed
liuh-80 pushed a commit to liuh-80/sonic-swss that referenced this pull request Mar 10, 2025
* [orchagent] implement ring buffer feature with a flag
What I did

add a ring thread for orchdaemon, which would be kicked off if gRingMode is turned on
support ring buffer feature, currently only enabled for route table executor, which has a scaled use case
fix the covariant return type issue of swss::TableBase* Consumer::getConsumerTable() const override
it should return swss::ConsumerTableBase *
Why I did it

increase the speed for APP_ROUTE_TABLE consumers doing tasks
liuh-80 pushed a commit to liuh-80/sonic-swss that referenced this pull request Mar 17, 2025
* [orchagent] implement ring buffer feature with a flag
What I did

add a ring thread for orchdaemon, which would be kicked off if gRingMode is turned on
support ring buffer feature, currently only enabled for route table executor, which has a scaled use case
fix the covariant return type issue of swss::TableBase* Consumer::getConsumerTable() const override
it should return swss::ConsumerTableBase *
Why I did it

increase the speed for APP_ROUTE_TABLE consumers doing tasks
@liuh-80
Copy link
Contributor

liuh-80 commented Mar 17, 2025

@a114j0y , I recently testing improve route performance with ring buffer/zmq/multiple-db, and the ring buffer sometimes will cause stuck when there are massive bgp routes, I check this on a lab mellanox 4600 device which has 12000+ BGP routes:

~$ sonic-db-cli ASIC_DB eval "return #redis.call('keys', 'ASIC_STATE:SAI_OBJECT_TYPE_ROUTE_ENTRY*')" 0
12951

~$ show ip bgp summary

IPv4 Unicast Summary:
BGP router identifier 10.1.0.32, local AS number 65100 vrf-id 0
BGP table version 22923
RIB entries 12851, using 1644928 bytes of memory
Peers 24, using 494208 KiB of memory
Peer groups 2, using 128 bytes of memory

Neighbhor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd NeighborName


10.0.0.1 4 65200 3201 5537 22923 0 0 00:00:34 6370 ARISTA01T2
10.0.0.5 4 65200 3201 5537 22923 0 0 00:00:35 6370 ARISTA03T2
10.0.0.9 4 65200 3201 5537 22923 0 0 00:00:34 6370 ARISTA05T2
10.0.0.13 4 65200 3201 5537 22923 0 0 00:00:34 6370 ARISTA07T2
,,,

Then run "sudo config bgp shutdown all" or "sudo config bgp startup all", and check the syslog, some time I get following log:

2025 Mar 17 06:56:42.396204 DEVICE_NAME WARNING swss#orchagent: :- processAnyTask: ring is full...push again
2025 Mar 17 06:56:42.439894 DEVICE_NAME INFO swss#rsyslogd: imuxsock[pid: 48, name: /usr/bin/orchagent] from bjw2-can-4600c-3:orchagent: begin to drop messages due to rate-limiting
2025 Mar 17 06:57:42.043606 DEVICE_NAME WARNING swss#supervisor-proc-exit-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes).
2025 Mar 17 06:58:13.002017 DEVICE_NAME INFO swss#rsyslogd: imuxsock[pid: 48, name: /usr/bin/orchagent]: 23441221 messages lost due to rate-limiting (20000 allowed within 300 seconds)
2025 Mar 17 06:58:13.082530 DEVICE_NAME INFO swss#rsyslogd: imuxsock[pid: 48, name: /usr/bin/orchagent] from <DEVICE_NAME :orchagent>: begin to drop messages due to rate-limiting

Seems there are some deadlock issue in the ring buffer code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants