Skip to content

feat: add support for InferencePool #823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Xunzhuo
Copy link
Member

@Xunzhuo Xunzhuo commented Jul 4, 2025

Description

This PR addes support for inferencePool support, which allows Envoy AI Gateway to integrate with ANY endpoint picker who is supported the inferencePool.

By integrating with the Endpoint Picker like Gateway API Inference Extenstion or the non-GIE EPP, it can expand Envoy AI Gateway`s abilities to advanced scheduleing algorithm to optimize inference.

Related Issues/PRs (if applicable)

Fixes: #423
Fixes #604

Some follow-up:

  1. Docs: Add User-Guide Docs & Update Blog
  2. Functionality: Support Host Override LbPolicy and fallback
  3. Testing: Support Upstream Conformance Test

@Xunzhuo Xunzhuo changed the title feat: add support for InferencePool based endpoint picker feat: add support for InferencePool Jul 4, 2025
@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch 11 times, most recently from 1e91fae to 8889ce9 Compare July 4, 2025 10:14
@Xunzhuo
Copy link
Member Author

Xunzhuo commented Jul 4, 2025

inference pool and endpoint picker:

Clipboard_Screenshot_1751624167

aigwroute:

Clipboard_Screenshot_1751624191

httproute:

Clipboard_Screenshot_1751624212

auto-generated eep (target to the httproute and refer to the endpoint picker backend):

Clipboard_Screenshot_1751624248

auto-generate backend (target to the endpoint picker service):

Clipboard_Screenshot_1751624313

cluster is patched by the extension server, called by the envoy gateway, modifies it to the original dst:

Clipboard_Screenshot_1751624451

@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch from 8889ce9 to b24d435 Compare July 4, 2025 13:23
@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch 3 times, most recently from c8dc839 to afed30f Compare July 4, 2025 14:34
@mathetake
Copy link
Member

mathetake commented Jul 4, 2025

instead of creating eep, how about adding extproc into the specific route which routes to the inference pool lb policy cluster? that way other normal routes won't need to talk to eep unnecessarily. This could be done in extension server i guess?

@yuzisun
Copy link
Contributor

yuzisun commented Jul 4, 2025

instead of creating eep, how about adding extproc into the specific route which routes to the inference pool lb policy cluster? that way other normal routes won't need to talk to eep unnecessarily. This could be done in extension server i guess?

That’s a very good point. We need to make sure normal routes do not go through epp.

@Xunzhuo
Copy link
Member Author

Xunzhuo commented Jul 5, 2025

Yep, that is reasonable :)

@mathetake
Copy link
Member

having said that, one concern about the per route is that it might not work well with ClearRouteCache: true which is set by our AI Gateway extproc. The EPP extproc must come after the ai gateway extproc since until then envoy doesn't know the destination. However, per route filter config might not work well with the deferred route calculation. Maybe it's not the case but something i am worried about it now...

@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch from 4943f6e to b676e5e Compare July 7, 2025 07:31
@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch 6 times, most recently from 5684f11 to d9db2fe Compare July 7, 2025 12:51
@Xunzhuo
Copy link
Member Author

Xunzhuo commented Jul 15, 2025

After envoyproxy/gateway#6524 lands, new algorithm would be:

  1. find inferencepool relevant listener
  2. insert epp extproc config into listener
  3. find unrelated routes under relevant listener
  4. insert extproc perroute to disable these routes

@mathetake
Copy link
Member

looks good and it would be much simpler

@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch 2 times, most recently from e0684ba to 1c83747 Compare July 16, 2025 03:48
@Xunzhuo
Copy link
Member Author

Xunzhuo commented Jul 16, 2025

new approach based on envoyproxy/gateway#6524 landed it in 03573fd, and e2e test passed locally

Clipboard_Screenshot_1752669028

@Xunzhuo Xunzhuo added this to the v0.3.0 milestone Jul 16, 2025
@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch from 1c83747 to 03573fd Compare July 16, 2025 06:23
@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch 4 times, most recently from a483876 to bfc78ab Compare July 18, 2025 07:35
@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch from bfc78ab to 537087e Compare July 18, 2025 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support DynamicLoadBalancing beyond AIE(API inference extension) Support k8s gateway API inference extensions
3 participants