Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

baby hero test scale #586

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

Conversation

mgheorghe
Copy link
Collaborator

@mgheorghe mgheorghe commented Jun 4, 2024

Test SAI Baby Hero.
Baby Hero test scale is 1% of the Hero test scale.
This is achieved by having only one prefix per ACL instead of 100 prefixes per ACL.

had to increase the BMv2 scale numbers to 8M and 4M from default 1K for mapping and routing tables

scale:

  • SAI_OBJECT_TYPE_VIP_ENTRY = 1
  • SAI_OBJECT_TYPE_DIRECTION_LOOKUP_ENTRY = 32
  • SAI_OBJECT_TYPE_VNET = 32
  • SAI_OBJECT_TYPE_DASH_ACL_GROUP = 320
  • SAI_OBJECT_TYPE_ENI = 32
  • SAI_OBJECT_TYPE_ENI_ETHER_ADDRESS_MAP_ENTRY = 32
  • SAI_OBJECT_TYPE_OUTBOUND_CA_TO_PA_ENTRY = 80.000 + 320
  • SAI_OBJECT_TYPE_OUTBOUND_ROUTING_ENTRY = 80.000 (SAI_OUTBOUND_ROUTING_ENTRY_ACTION_ROUTE_VNET) + 80.000 (SAI_OUTBOUND_ROUTING_ENTRY_ACTION_ROUTE_VNET_DIRECT)
  • SAI_OBJECT_TYPE_PA_VALIDATION_ENTRY = 32

run time:

  • on my setup
    - configure BMv2: 15 min
    - run traffic: 3 min
    - clear config: 6 min
  • github action runtime 10 min for whole test

TO DO:

* path fix

we need to go in dash folder first

* baby hero

* Update test_sai_baby_hero_traffic.py

* Update test_sai_baby_hero.py

* dpugen update to 2.2.0

* removed dev test file

* revert file so container update does not trigger

* remove some logging
@chrispsommers
Copy link
Collaborator

Is this supposed to run in CI? I see the runtime in CI is just a few minutes.
Is there any penalty to scaling up bmv2 tables (ignoring a new test)?

@mgheorghe
Copy link
Collaborator Author

Is this supposed to run in CI? I see the runtime in CI is just a few minutes. Is there any penalty to scaling up bmv2 tables (ignoring a new test)?

here i see no test was run
run sai-chalenger test 0 sec

right before submit in my branch https://github.com/mgheorghe/DASH/pull/6/checks
it took 10 min for all the tests

@r12f
Copy link
Collaborator

r12f commented Jun 5, 2024

hi Mircea, these tests are huge, so are they generated? I am very worried about how to maintain them when things are changed.

@r12f
Copy link
Collaborator

r12f commented Jun 5, 2024

Hi Mircea, I really appreciate the effort of adding the tests. However, I think a more proper place of adding this test is actually in sonic-mgmt repo.

There are a few reasons:

  1. The CI should more focus on the correctness not speed or performance.
  2. It is hard to assert regression happened with perf test in CI. These tests usually take long time and very sensitive to many factors, e.g., are the build machine busy doing something else. The build VMs that azure devop or github action uses are running on a shared machine, so the data will not be stable.
  3. Moving forward, we are building DPU image and SmartSwitch virtual DUTs. These images and topologies will be used for running the sonic-mgmt tests. These tests can be scheduled as nightly runs, and we have dedicated resources for running them. It will be a much better place than the CI build machine.
  4. Another advantage of adding this into sonic-mgmt repo is that, these tests will be naturally useable for the real devices, not just the BMv2 or any virtual/software implementation.

To summarize, I would recommend:

  1. Moving this test to sonic-mgmt repo, where we already have a list of DASH tests here: https://github.com/sonic-net/sonic-mgmt/tree/master/tests/dash. It will help us keep the engineering efficiency, producing data with better quality, naturally support any implementations including the real hardware and also align with the direction we are going towards.
  2. If you don't mind the effort, I would even recommend adding the HERO tests there as well! We have this discussed internally as well, and we all think this will be an awesome thing to have!

@r12f r12f self-requested a review June 5, 2024 14:44
Copy link
Collaborator

@r12f r12f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes. Please see my comment above.

@mgheorghe
Copy link
Collaborator Author

  1. The CI should more focus on the correctness not speed or performance.
  2. It is hard to assert regression happened with perf test in CI.

the test is focused on scale not performance, i have it set to 1 pps, goal was to exercise 32ENI's each with 10NSG's each with 1000 ACLs..... scale should be just a function of memory available.

Moving this test to sonic-mgmt repo, where we already have a list of DASH tests here: https://github.com/sonic-net/sonic-mgmt/tree/master/tests/dash

i'll contribute them there too but, the sonic-mgmt repo will get the dash config version while BMv2 gets the SAI version of the config.

i see the image is also based on BMv2 so all the issues i encountered with BMv2 will be there too + some more from all the sonic layers. https://github.com/sonic-net/SONiC/blob/master/images/dash/bmv2-virtual-sonic.svg

my conclusion is that i would like to add tests in both repos, i'll keep this one here and add a dash one in sonic-mgmt (it will require more work since i would like to make a proposal for an OTG virtual testbed for sonic-mgmt, maybe even adding TCP traffic)

@mgheorghe
Copy link
Collaborator Author

hi Mircea, these tests are huge, so are they generated? I am very worried about how to maintain them when things are changed.

generated and that 100M config file was supposed to be generated on the fly in memory at runtime, but i ran into an issue with #584 so i had to dump the config here till that issue is resolved.

@mgheorghe mgheorghe marked this pull request as draft June 12, 2024 16:14
@KrisNey-MSFT
Copy link
Collaborator

Issue is open in SAI Challenger, version of Debian is too old, container fails to build, etc... but we need a fix upstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants