Replies: 3 comments 7 replies
-
i am ooo until next week - let me come back (or ping me in case i forget about this thread) later... |
Beta Was this translation helpful? Give feedback.
-
@Stephen-X so first of all, the actual rate limit configuration is done through EG's BackendTrafficPolicy as you can see in the example. Envoy AI Gateway's role regarding rate limit is configure how to calculate the cost of each request through llmRequestCosts.
Now the question I have for you is why do you let users set its own request rate limit? how do you enforce the rules? If the situation is really like each user (I assume they are distinguished with say same "user-id" header) has infinite possibility of dynamic rate limit value, i don't think any existing rate limit mechanism work for such case, not the limitation of Envoy Gateway. How do you ensure that each user will send exactly the same |
Beta Was this translation helpful? Give feedback.
-
btw the embedding endpoint support is definitely valuable so I would love to see that happening here! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
I'm inquiring if there's currently a way to set the global token rate limit dynamically using a request header.
We run a high-traffic service that hosts thousands of users and are working on enhancing some of the features with GenAI. We are currently exploring using Envoy AI Gateway with some customizations (we understand OAI-compatible embedding API is currently not a supported scenario for EAG, so that's something we could develop to start with and perhaps contribute back) as an internal AI Gateway to a few of our LLM endpoints. This is the high level flow:
To support different configs for each user, EAG requires creating separate K8s custom resources, but that would require a lot of work for us to build a dynamic config loading pipeline for EAG. My understanding is I need to set up a separate xDS control plane to load configs dynamically, per Configuration: Dynamic from control plane? That sounds highly complex. We are hoping to simplify development by letting the frontend service control EAG behavior dynamically for each user.
Instead of having a separate
AIGatewayRoute
for each user, we're trying to see if it's possible for the frontend, which already has user configs loaded, to control EAG with request headers.For example, say a user set a maximum rate limit of 11 TPS, then our frontend could be sending requests to the backends with the following 3 headers:
And EAG could be comparing the TPS limit against the counter value in Redis to decide if the current request should be throttled.
Since this is not readily available, we are wondering if we could still utilize the existing rate limit functionality as much as possible with a bit extra dev work.
Beta Was this translation helpful? Give feedback.
All reactions