Enhancement Proposal - introduce comprehensive load testing framework for KGateway #11597

MayorFaj · 2025-07-06T18:34:58Z

Change Type

/kind documentation

Changelog

NONE

…KGateway Signed-off-by: MayorFaj <[email protected]>

Signed-off-by: MayorFaj <[email protected]>

Copilot

Pull Request Overview

A design proposal introducing a comprehensive load testing framework for KGateway, detailing core test scenarios, CI/CD integration, and test planning.

Defines Attached Routes, Route Probe, and Route Change test scenarios based on gateway-api-bench patterns
Describes integration into existing Ginkgo-based e2e tests and Makefile targets for local and CI usage
Presents alternatives, open questions for thresholds, and technical implementation considerations

Comments suppressed due to low confidence (3)

design/11210.md:116

[nitpick] Consider adding an explicit hyperlink to the gateway-api-bench repository (e.g., https://github.com/kubernetes-sigs/gateway-api-bench) so readers can quickly locate the benchmark source.

* **Integrated Benchmark Tests**: Run gateway-api-bench benchmarks as part of KGateway's test suite

design/11210.md:134

[nitpick] It may help maintainers if the Go snippet includes the import path for load_testing (e.g., import "test/kubernetes/e2e/features/load_testing") so they know where NewTestingSuite is defined.

func KubeGatewaySuiteRunner() e2e.SuiteRunner {

design/11210.md:188

[nitpick] The decision label for Alternative 2 is unbolded. For consistency, consider bolding both statuses (e.g., **Decision**: **REJECTED**) or matching formatting.

* **Decision**: **PROPOSED** - Leverages proven patterns within existing infrastructure

shashankram · 2025-07-08T16:23:57Z

@howardjohn will have good inputs for perf testing

npolshakova · 2025-07-08T19:28:12Z

design/11210.md

+### CI/CD Integration Strategy
+
+1. **Pipeline Integration**:
+   * Should load tests run in CI for every PR or only on specific triggers?


We should only run the tests on 1) releases, 2) nightlys

npolshakova · 2025-07-08T19:28:58Z

design/11210.md

+
+1. **Framework Architecture**:
+   * Should we implement a simulation framework similar to gateway-api-bench patterns?
+   * What's the preferred approach: bulk operations vs batched operations?


Let's define bulk. vs. batched operations here (I'm guessing it's referring to applying all HTTPRoutes vs. batching groups of routes applied?).

I would use batch (ie. apply 100 routes at a time) for more fine-grained measurements and observability.

npolshakova · 2025-07-08T19:33:03Z

design/11210.md

+
+## Alternatives
+
+### Alternative 1: Ginkgo E2E Test Integration (PROPOSED)


So all of our e2e tests are currently using testify instead of Ginkgo: https://github.com/kgateway-dev/kgateway/blob/main/test/kubernetes/e2e/suite.go#L7

What are the benefits of using Ginkgo here instead of testify?

npolshakova · 2025-07-08T19:39:05Z

design/11210.md

+#### Unit Tests
+
+* Test simulation framework components (runner, metrics, watcher)
+* Mock Gateway and HTTPRoute resource handling
+* Metrics collection and reporting logic
+* Performance baseline calculations
+
+#### Integration Tests
+
+* End-to-end load test scenarios within KGateway's existing e2e framework
+* Integration with KGateway's test installation and cleanup
+* Performance regression detection
+* CI/CD pipeline integration testing


I think we just want to focus on the performance baselines. We can add unit tests as needed but there already should be tests for the metrics collection, and resource handling.

For the load testing e2e tests, how are they different from the performance baselines? We want to focus on testing the control plane, not the data plane here.

npolshakova · 2025-07-08T19:47:33Z

design/11210.md

+### Test Coverage and Scenarios
+
+1. **Monitoring and Metrics Collection**:
+   * What metrics should be collected during load tests?


There's some wip documentation here on the control plane metrics: kgateway-dev/kgateway.dev#296

We probably want to look at:

kgateway_collection_transforms_total

kgateway_collection_transform_duration_seconds

kgateway_collection_resources

kgateway_status_syncer_resources

npolshakova · 2025-07-08T19:50:18Z

design/11210.md

+### Performance Thresholds and Baselines
+
+1. **Setup and Teardown Time Targets**:
+   * What should be the target setup time for 1000 routes?


Based on the https://github.com/howardjohn/gateway-api-bench it looks like kgateway had a setup time of 12s. We should measure the setup time though and also report if it's taking longer than expected.

npolshakova · 2025-07-08T19:51:29Z

design/11210.md

+
+1. **Setup and Teardown Time Targets**:
+   * What should be the target setup time for 1000 routes?
+   * Should teardown time have the same threshold as setup time?


The teardown in the blog for kgateway was about 16s. So I think it's reasonable to aim for under 30s for the setup and tear down. If it takes longer, we should trigger a regression alert and fail the test.

npolshakova · 2025-07-08T19:53:24Z

design/11210.md

+   * What should be the target setup time for 1000 routes?
+   * Should teardown time have the same threshold as setup time?
+   * Should thresholds scale based on route count
+   * What's the acceptable performance degradation for multi-gateway scenarios?


We should also test the multiple gateway scenario, but the setup/teardown threshold may be higher. Let's initially build a single gateway performance test in the first iteration, then add the multi-gateway setup in the followup.

npolshakova · 2025-07-08T19:54:22Z

design/11210.md

+1. **Setup and Teardown Time Targets**:
+   * What should be the target setup time for 1000 routes?
+   * Should teardown time have the same threshold as setup time?
+   * Should thresholds scale based on route count


Let's follow the blog and use the same scenario:

To simulate a large cluster, we run with 50 namespaces with 100 routes each (5,000 routes total)

If we go with the batch apply approach, thresholds can be per batch of applied routes (per 100 routes. E.g., ≤ 6s/100 routes, actual ranges will have to be figured out with testing)

npolshakova · 2025-07-08T19:58:41Z

design/11210.md

+   * What's the acceptable performance degradation for multi-gateway scenarios?
+
+2. **Production Scale Requirements**:
+   * What are the target production scales we're optimizing for?


Let's follow the "large cluster" scenario and run with 50 namespaces with 100 routes each (5,000 routes total)

npolshakova · 2025-07-08T19:59:07Z

design/11210.md

+   * What are the target production scales we're optimizing for?
+   * Maximum expected routes per gateway in production environments?
+   * Should we test different route counts (100, 500, 1000, 5000)?
+   * What's the expected number of gateways per cluster in production?


Let's do a single gateway test and multi-gateway test with separate thresholds

npolshakova · 2025-07-08T20:00:36Z

design/11210.md

+
+1. **Monitoring and Metrics Collection**:
+   * What metrics should be collected during load tests?
+   * Should we monitor memory/CPU usage of kgateway components?


Yep! In addition to the control plane metrics, we should look at the control plane cpu and memory. Looking at the data plane cpu/memory usage can be a follow up.

npolshakova · 2025-07-08T20:01:14Z

design/11210.md

+1. **Monitoring and Metrics Collection**:
+   * What metrics should be collected during load tests?
+   * Should we monitor memory/CPU usage of kgateway components?
+   * Do we need to track API server response times?


What to you mean by API server response time here? Response time for a request to go through the data plane?

npolshakova · 2025-07-08T20:03:16Z

design/11210.md

+   * What metrics should be collected during load tests?
+   * Should we monitor memory/CPU usage of kgateway components?
+   * Do we need to track API server response times?
+   * Should we measure end-to-end latency (creation to traffic-ready)?


This is a good question, if we want to only look at control plane metrics vs. track the full latency from HTTPRoute creation to first successful probe. @jmunozro has some experience here and might know?

npolshakova · 2025-07-08T20:07:50Z

design/11210.md

+* Automated regression detection comparing current vs. baseline results
+* Performance trend analysis over time
+* Alert thresholds for CI/CD failures
+


Let's add a note that the performance tests will be run on a single node kind cluster (exact specs TBD, John used: 16-core AMD 9950x CPU with 96GB RAM)

npolshakova · 2025-07-08T20:09:12Z

design/11210.md

+  * Ginkgo test that generates continuous HTTP traffic using test helpers
+  * Apply route configuration changes during traffic
+  * Monitor request success rates and response times
+* **Metrics**: Request success rate, latency distribution, error count during changes


I'm not sure if we want to measure request success rates/latency since we're focusing the performance testing on the control plane

npolshakova · 2025-07-08T20:10:47Z

design/11210.md

+
+Performance testing is essential for Gateway API implementations as they handle critical traffic routing decisions. Current testing focuses primarily on functional correctness, leaving performance characteristics largely unvalidated. A comprehensive load testing framework will help identify performance bottlenecks, ensure scalability, and prevent regressions in production deployments.
+
+The proposed framework will provide standardized performance testing patterns that can be consistently applied across KGateway development cycles, enabling reliable performance validation and regression detection both locally and in CI/CD pipelines.


Let's add a reference to the benchmarking repo to cite the source and give some background: https://github.com/howardjohn/gateway-api-bench

Also would be good to call out some notes in this background section:

A single HTTPRoute may be attached to many Gateway's simultaneously.

The Gateway object has a status message with the attachedRoutes which stores a count of the total successfully attached route objects.

Setup time: time after the last route is created until the attachedRoutes is updated to the total route count.

Teardown time: time after the last route is deleted until the status is reset back to zero.

npolshakova · 2025-07-08T20:13:02Z

design/11210.md

+   * What's the optimal concurrency level to avoid API server overload?
+   * Should concurrency scale with cluster size/resources?
+   * What are the Kubernetes API server rate limits we need to respect?
+   * Minimum cluster requirements for running load tests?


Let's use a single-node kind cluster to help minimize noise, John's blog used 16-core AMD 9950x CPU with 96GB RAM, we can determine what makes sense in CI with some testing, but can start with the standard github runner we're using for the nightlys (ubuntu-22.04).

npolshakova · 2025-07-08T20:16:16Z

design/11210.md

+### Technical Implementation
+
+1. **Framework Architecture**:
+   * Should we implement a simulation framework similar to gateway-api-bench patterns?


Do you mean using https://github.com/howardjohn/pilot-load to simulate clusters but no pods are scheduled onto any physical machine?

npolshakova · 2025-07-08T20:17:16Z

design/11210.md

+1. **Framework Architecture**:
+   * Should we implement a simulation framework similar to gateway-api-bench patterns?
+   * What's the preferred approach: bulk operations vs batched operations?
+   * How many concurrent workers should be used for route creation/deletion?


Let's start with a single worker for route creation, then add concurrent workers as needed.

npolshakova · 2025-07-08T20:20:24Z

design/11210.md

+   * Where should historical performance data be stored?
+
+3. **Result Persistence and Reporting**:
+   * How will performance metrics integrate with existing e2e test reporting?


These are great questions, but might make sense as a set of follow-up tasks. When running the performance tests locally, users should be able to use a granfana dashboard (something like https://github.com/howardjohn/pilot-load/blob/master/install/dashboard.json) to view the results.

npolshakova · 2025-07-08T20:26:48Z

design/11210.md

+   * Should we use server-side apply or standard create operations?
+
+2. **Baseline Establishment and Regression Detection**:
+   * How should initial performance baselines be established?


Let's base the initial baselines based on the performance results from John's blog. For the patch applies, we'll need to experiment to find reasonable thresholds.

For the control plane status updates:

| | Cilium | Envoy Gateway | Istio | Kgateway | Nginx | | ------------------------------ | ------ | ------------- | ----- | -------- | ----- | | Setup time | 1s | 10s | 2s | 17s | 17s | | Teardown time | 27s | 45s | 27s | 28s | 28s | | Total Writes | 243 | 1484 | 73 | 15 | 128 |

CPU/Memory threshold max:

| Gateway | Relative CPU Consumption | Relative Memory Usage | Kgateway | 6.6x | 6.2x

For the load testing:

DEST QPS CONS DUR PAYLOAD THROUGHPUT P50 P90 P99 kgateway 0 1 30 0 36793.86qps 0.028ms 0.030ms 0.043ms kgateway 15000 1 10 0 14998.28qps 0.028ms 0.036ms 0.089ms kgateway 0 16 10 0 170469.20qps 0.085ms 0.144ms 0.227ms kgateway 15000 16 10 0 14998.34qps 0.091ms 0.148ms 0.230ms kgateway 100000 16 10 0 99997.92qps 0.046ms 0.119ms 0.225ms kgateway 50000 16 10 4096 49998.15qps 0.043ms 0.064ms 0.148ms kgateway 0 512 10 0 400687.22qps 1.215ms 2.254ms 3.575ms

MayorFaj added 2 commits July 6, 2025 19:27

EP(load-testing): introduce comprehensive load testing framework for …

b9bc50e

…KGateway Signed-off-by: MayorFaj <[email protected]>

EP(load-testing)

1363eae

Signed-off-by: MayorFaj <[email protected]>

Copilot AI review requested due to automatic review settings July 6, 2025 18:34

Copilot AI reviewed Jul 6, 2025

View reviewed changes

github-actions bot added kind/documentation release-note-none labels Jul 6, 2025

MayorFaj closed this Jul 7, 2025

MayorFaj reopened this Jul 7, 2025

Merge branch 'main' into feat/11210-ep-loadtests

dc4535a

npolshakova self-requested a review July 8, 2025 19:10

npolshakova reviewed Jul 8, 2025

View reviewed changes


		## Alternatives

		### Alternative 1: Ginkgo E2E Test Integration (PROPOSED)


		Performance testing is essential for Gateway API implementations as they handle critical traffic routing decisions. Current testing focuses primarily on functional correctness, leaving performance characteristics largely unvalidated. A comprehensive load testing framework will help identify performance bottlenecks, ensure scalability, and prevent regressions in production deployments.

		The proposed framework will provide standardized performance testing patterns that can be consistently applied across KGateway development cycles, enabling reliable performance validation and regression detection both locally and in CI/CD pipelines.

Enhancement Proposal - introduce comprehensive load testing framework for KGateway #11597

Are you sure you want to change the base?

Enhancement Proposal - introduce comprehensive load testing framework for KGateway #11597

Conversation

MayorFaj commented Jul 6, 2025

Change Type

Changelog

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

shashankram commented Jul 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

npolshakova Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

npolshakova Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

npolshakova Jul 8, 2025 •

edited

Loading

npolshakova Jul 8, 2025 •

edited

Loading