[controller] Account for max read capacity in quota change#2235
[controller] Account for max read capacity in quota change#2235misyel wants to merge 12 commits intolinkedin:mainfrom
Conversation
...s/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceParentHelixAdmin.java
Outdated
Show resolved
Hide resolved
| private final boolean disableParentRequestTopicForStreamPushes; | ||
|
|
||
| private final int defaultReadQuotaPerRouter; | ||
| private final long maxReadCapacityCu; |
There was a problem hiding this comment.
Can you add a comment on what these 2 variables mean and also revisit the name to make it more clear from the name? Lets also add router in the name if it's only for router quota
There was a problem hiding this comment.
Good point - added a comment to describe the usage of the two variables and renamed the new config to include router
...e-controller/src/main/java/com/linkedin/venice/controller/VeniceControllerClusterConfig.java
Outdated
Show resolved
Hide resolved
| long maxReadCapacityCu = clusterConfig.getMaxReadCapacityCu(); | ||
| long maxPerRouterCapacity = Math.max(defaultReadQuotaPerRouter, maxReadCapacityCu); | ||
| long totalClusterCapacity = maxPerRouterCapacity * routerCount; | ||
| if (Math.max(totalClusterCapacity, maxPerRouterCapacity) < readQuotaInCU.get()) { |
There was a problem hiding this comment.
totalClusterCapacity will always be >= maxPerRouterCapacity
There was a problem hiding this comment.
This isn't true for the parent because the router count is 0 and totalClusterCapacity will be 0. We need to take the max of totalClusterCapacity and maxPerRouterCapacity to correctly account for this case
There was a problem hiding this comment.
can you add some comments, if not it looks like a bug (thats what we though when we encountered this if condition before your change 😆 )
There was a problem hiding this comment.
Done - added a comment to explain why we need to take the max of total cluster capacity and per router capacity
| props.getBoolean(CONTROLLER_DISABLE_PARENT_REQUEST_TOPIC_FOR_STREAM_PUSHES, false); | ||
| this.defaultReadQuotaPerRouter = | ||
| props.getInt(CONTROLLER_DEFAULT_READ_QUOTA_PER_ROUTER, DEFAULT_PER_ROUTER_READ_QUOTA); | ||
| this.maxRouterReadCapacityCu = props.getLong(MAX_READ_CAPACITY, MAX_ROUTER_READ_CAPACITY_CU); |
There was a problem hiding this comment.
This config needs to be done on both the router and controller from now on?
Also, from router code, I see MAX_READ_CAPACITY with default of 100k and ROUTER_MAX_READ_CAPACITY with default of 6000. how are those different? can we also use the same static variable in router code as well to be consistent?
There was a problem hiding this comment.
Yes, it needs to be on both the controller and router from now on so that they share the same value.
ROUTER_MAX_READ_CAPACITY is used as an early throttler before any requests are processed and it will reject the request if the current number of requests for all stores is larger than the configured limit. I believe it's to prevent the router from being overwhelmed from too many requests at once. MAX_READ_CAPACITY is used to distribute the router quota fairly per store and it will decrease each store's quota by a factor if the total store quota is larger than the MAX_READ_CAPACITY value
| long maxReadCapacityCu = clusterConfig.getMaxReadCapacityCu(); | ||
| long maxPerRouterCapacity = Math.max(defaultReadQuotaPerRouter, maxReadCapacityCu); | ||
| long totalClusterCapacity = maxPerRouterCapacity * routerCount; | ||
| if (Math.max(totalClusterCapacity, maxPerRouterCapacity) < readQuotaInCU.get()) { |
There was a problem hiding this comment.
can you add some comments, if not it looks like a bug (thats what we though when we encountered this if condition before your change 😆 )
|
Hi there. This pull request has been inactive for 30 days. To keep our review queue healthy, we plan to close it in 7 days unless there is new activity. If you are still working on this, please push a commit, leave a comment, or convert it to draft to signal intent. Thank you for your time and contributions. |
|
Closing this pull request due to 37 days of inactivity. This is not a judgment on the value of the work. If you would like to continue, please reopen or open a new PR and we will be happy to take another look. Thank you again for contributing. |
There was a problem hiding this comment.
Pull request overview
This pull request fixes quota validation logic in the controller to properly account for the maximum read capacity when approving or rejecting quota change requests. Previously, the controller only validated against a default router capacity, but the correct logic should validate against the maximum router read capacity, which is the same value used by routers for actual throttling.
Changes:
- Updated quota validation in VeniceHelixAdmin to use max router read capacity instead of default quota per router
- Consolidated separate controller and router capacity constants into a single shared constant (MAX_ROUTER_READ_CAPACITY_CU)
- Updated router config default from hardcoded 100,000 to 20,000,000 to align with deployed configurations
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| VeniceHelixAdmin.java | Updated quota validation logic to use maxRouterReadCapacityCu and properly handle parent controller edge case (0 live routers) |
| VeniceControllerClusterConfig.java | Replaced defaultReadQuotaPerRouter field with maxRouterReadCapacityCu, reading from MAX_READ_CAPACITY config key |
| VeniceRouterConfig.java | Changed default value from hardcoded 100,000 to MAX_ROUTER_READ_CAPACITY_CU constant (20,000,000) |
| VeniceConstants.java | Renamed DEFAULT_PER_ROUTER_READ_QUOTA to MAX_ROUTER_READ_CAPACITY_CU, updated documentation |
| TestVeniceHelixAdmin.java | Added new test for quota validation with multiple test cases |
| Test utility files | Updated references from old constant to new constant |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...e-controller/src/main/java/com/linkedin/venice/controller/VeniceControllerClusterConfig.java
Outdated
Show resolved
Hide resolved
...ces/venice-controller/src/test/java/com/linkedin/venice/controller/TestVeniceHelixAdmin.java
Show resolved
Hide resolved
...e-controller/src/main/java/com/linkedin/venice/controller/VeniceControllerClusterConfig.java
Show resolved
Hide resolved
services/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceHelixAdmin.java
Outdated
Show resolved
Hide resolved
services/venice-router/src/main/java/com/linkedin/venice/router/VeniceRouterConfig.java
Show resolved
Hide resolved
| ZkRoutersClusterManager routersClusterManager = resources.getRoutersClusterManager(); | ||
| int routerCount = routersClusterManager.getLiveRoutersCount(); | ||
| VeniceControllerClusterConfig clusterConfig = resources.getConfig(); | ||
| long totalClusterReadCapacity = clusterConfig.getMaxRouterReadCapacityCu() * routerCount; |
There was a problem hiding this comment.
Do we want to include server here? Do we have server read capacity estimation ?
There was a problem hiding this comment.
We can make the server read capacity estimation as a separate change because it will require ramping and rollout efforts
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * Configs to specify the default and max per router quota. This is used in {@link VeniceHelixAdmin} to determine whether | ||
| * a quota change should be approved or denied and for throttling reads. This value represents the maximum capacity a single | ||
| * router can handle |
There was a problem hiding this comment.
The comment mentions "default and max per router quota" but after this PR there is only one config (maxRouterReadCapacityCu). The comment should be updated to say "max per router quota" or "per router read capacity" to accurately reflect that this is a single config value representing the maximum capacity.
| * Configs to specify the default and max per router quota. This is used in {@link VeniceHelixAdmin} to determine whether | |
| * a quota change should be approved or denied and for throttling reads. This value represents the maximum capacity a single | |
| * router can handle | |
| * Config to specify the max per router read capacity/quota. This is used in {@link VeniceHelixAdmin} to determine whether | |
| * a quota change should be approved or denied and for throttling reads. This value represents the maximum capacity a single | |
| * router can handle. |
Problem Statement
When a new quota request comes in to the controller, we only validate the value against the default router capacity before approving or rejecting the request. There is a separate max read capacity that is used for router throttling and the correct calculation would be to approve the request if it is within the default router capacity or the max read capacity.
Solution
Code changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
New unit test
Does this PR introduce any user-facing or breaking changes?