-
Notifications
You must be signed in to change notification settings - Fork 72
feat: add support for InferencePool #823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1e91fae
to
8889ce9
Compare
8889ce9
to
b24d435
Compare
c8dc839
to
afed30f
Compare
instead of creating eep, how about adding extproc into the specific route which routes to the inference pool lb policy cluster? that way other normal routes won't need to talk to eep unnecessarily. This could be done in extension server i guess? |
That’s a very good point. We need to make sure normal routes do not go through epp. |
Yep, that is reasonable :) |
having said that, one concern about the per route is that it might not work well with ClearRouteCache: true which is set by our AI Gateway extproc. The EPP extproc must come after the ai gateway extproc since until then envoy doesn't know the destination. However, per route filter config might not work well with the deferred route calculation. Maybe it's not the case but something i am worried about it now... |
4943f6e
to
b676e5e
Compare
5684f11
to
d9db2fe
Compare
a807100
to
994dff7
Compare
994dff7
to
077f519
Compare
5e319d4
to
6ea43c8
Compare
After envoyproxy/gateway#6524 lands, new algorithm would be:
|
looks good and it would be much simpler |
e0684ba
to
1c83747
Compare
new approach based on envoyproxy/gateway#6524 landed it in 03573fd, and e2e test passed locally ![]() |
1c83747
to
03573fd
Compare
a483876
to
bfc78ab
Compare
Signed-off-by: bitliu <[email protected]>
bfc78ab
to
537087e
Compare
Description
This PR addes support for inferencePool support, which allows Envoy AI Gateway to integrate with ANY endpoint picker who is supported the inferencePool.
By integrating with the Endpoint Picker like Gateway API Inference Extenstion or the non-GIE EPP, it can expand Envoy AI Gateway`s abilities to advanced scheduleing algorithm to optimize inference.
Related Issues/PRs (if applicable)
Fixes: #423
Fixes #604
Some follow-up: