-
Notifications
You must be signed in to change notification settings - Fork 108
[controller] Account for max read capacity in quota change #2235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
...s/venice-controller/src/main/java/com/linkedin/venice/controller/VeniceParentHelixAdmin.java
Outdated
Show resolved
Hide resolved
| private final boolean disableParentRequestTopicForStreamPushes; | ||
|
|
||
| private final int defaultReadQuotaPerRouter; | ||
| private final long maxReadCapacityCu; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment on what these 2 variables mean and also revisit the name to make it more clear from the name? Lets also add router in the name if it's only for router quota
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - added a comment to describe the usage of the two variables and renamed the new config to include router
...e-controller/src/main/java/com/linkedin/venice/controller/VeniceControllerClusterConfig.java
Outdated
Show resolved
Hide resolved
| long maxReadCapacityCu = clusterConfig.getMaxReadCapacityCu(); | ||
| long maxPerRouterCapacity = Math.max(defaultReadQuotaPerRouter, maxReadCapacityCu); | ||
| long totalClusterCapacity = maxPerRouterCapacity * routerCount; | ||
| if (Math.max(totalClusterCapacity, maxPerRouterCapacity) < readQuotaInCU.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totalClusterCapacity will always be >= maxPerRouterCapacity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true for the parent because the router count is 0 and totalClusterCapacity will be 0. We need to take the max of totalClusterCapacity and maxPerRouterCapacity to correctly account for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments, if not it looks like a bug (thats what we though when we encountered this if condition before your change 😆 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - added a comment to explain why we need to take the max of total cluster capacity and per router capacity
| props.getBoolean(CONTROLLER_DISABLE_PARENT_REQUEST_TOPIC_FOR_STREAM_PUSHES, false); | ||
| this.defaultReadQuotaPerRouter = | ||
| props.getInt(CONTROLLER_DEFAULT_READ_QUOTA_PER_ROUTER, DEFAULT_PER_ROUTER_READ_QUOTA); | ||
| this.maxRouterReadCapacityCu = props.getLong(MAX_READ_CAPACITY, MAX_ROUTER_READ_CAPACITY_CU); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config needs to be done on both the router and controller from now on?
Also, from router code, I see MAX_READ_CAPACITY with default of 100k and ROUTER_MAX_READ_CAPACITY with default of 6000. how are those different? can we also use the same static variable in router code as well to be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it needs to be on both the controller and router from now on so that they share the same value.
ROUTER_MAX_READ_CAPACITY is used as an early throttler before any requests are processed and it will reject the request if the current number of requests for all stores is larger than the configured limit. I believe it's to prevent the router from being overwhelmed from too many requests at once. MAX_READ_CAPACITY is used to distribute the router quota fairly per store and it will decrease each store's quota by a factor if the total store quota is larger than the MAX_READ_CAPACITY value
| long maxReadCapacityCu = clusterConfig.getMaxReadCapacityCu(); | ||
| long maxPerRouterCapacity = Math.max(defaultReadQuotaPerRouter, maxReadCapacityCu); | ||
| long totalClusterCapacity = maxPerRouterCapacity * routerCount; | ||
| if (Math.max(totalClusterCapacity, maxPerRouterCapacity) < readQuotaInCU.get()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments, if not it looks like a bug (thats what we though when we encountered this if condition before your change 😆 )
Problem Statement
When a new quota request comes in to the controller, we only validate the value against the default router capacity before approving or rejecting the request. There is a separate max read capacity that is used for router throttling and the correct calculation would be to approve the request if it is within the default router capacity or the max read capacity.
Solution
Code changes
Concurrency-Specific Checks
Both reviewer and PR author to verify
synchronized,RWLock) are used where needed.ConcurrentHashMap,CopyOnWriteArrayList).How was this PR tested?
New unit test
Does this PR introduce any user-facing or breaking changes?