feat: add support for InferencePool #823

Xunzhuo · 2025-07-04T04:24:12Z

Description

This PR addes support for inferencePool support, which allows Envoy AI Gateway to integrate with ANY endpoint picker who is supported the inferencePool.

By integrating with the Endpoint Picker like Gateway API Inference Extenstion or the non-GIE EPP, it can expand Envoy AI Gateway`s abilities to advanced scheduleing algorithm to optimize inference.

Related Issues/PRs (if applicable)

Fixes: #423
Fixes #604

Some follow-up:

Docs: Add User-Guide Docs & Update Blog
Functionality: Support Host Override LbPolicy and fallback
Testing: Support Upstream Conformance Test

Xunzhuo · 2025-07-04T10:21:37Z

inference pool and endpoint picker:

aigwroute:

httproute:

auto-generated eep (target to the httproute and refer to the endpoint picker backend):

auto-generate backend (target to the endpoint picker service):

cluster is patched by the extension server, called by the envoy gateway, modifies it to the original dst:

internal/controller/gateway.go

mathetake · 2025-07-04T18:19:15Z

instead of creating eep, how about adding extproc into the specific route which routes to the inference pool lb policy cluster? that way other normal routes won't need to talk to eep unnecessarily. This could be done in extension server i guess?

yuzisun · 2025-07-04T22:46:26Z

instead of creating eep, how about adding extproc into the specific route which routes to the inference pool lb policy cluster? that way other normal routes won't need to talk to eep unnecessarily. This could be done in extension server i guess?

That’s a very good point. We need to make sure normal routes do not go through epp.

Xunzhuo · 2025-07-05T00:52:26Z

Yep, that is reasonable :)

mathetake · 2025-07-06T19:05:43Z

having said that, one concern about the per route is that it might not work well with ClearRouteCache: true which is set by our AI Gateway extproc. The EPP extproc must come after the ai gateway extproc since until then envoy doesn't know the destination. However, per route filter config might not work well with the deferred route calculation. Maybe it's not the case but something i am worried about it now...

internal/extensionserver/post_cluster_modify.go

data.json

Xunzhuo · 2025-07-15T04:10:46Z

After envoyproxy/gateway#6524 lands, new algorithm would be:

find inferencepool relevant listener
insert epp extproc config into listener
find unrelated routes under relevant listener
insert extproc perroute to disable these routes

mathetake · 2025-07-15T16:13:33Z

looks good and it would be much simpler

Xunzhuo · 2025-07-16T03:50:06Z

new approach based on envoyproxy/gateway#6524 landed it in 03573fd, and e2e test passed locally

Signed-off-by: bitliu <[email protected]>

Xunzhuo changed the title ~~feat: add support for InferencePool based endpoint picker~~ feat: add support for InferencePool Jul 4, 2025

Xunzhuo force-pushed the feat-epp-integration branch 11 times, most recently from 1e91fae to 8889ce9 Compare July 4, 2025 10:14

Xunzhuo force-pushed the feat-epp-integration branch from 8889ce9 to b24d435 Compare July 4, 2025 13:23

yuzisun reviewed Jul 4, 2025

View reviewed changes

internal/controller/gateway.go Outdated Show resolved Hide resolved

internal/controller/gateway.go Outdated Show resolved Hide resolved

internal/controller/gateway.go Outdated Show resolved Hide resolved

Xunzhuo force-pushed the feat-epp-integration branch 3 times, most recently from c8dc839 to afed30f Compare July 4, 2025 14:34

Xunzhuo force-pushed the feat-epp-integration branch from 4943f6e to b676e5e Compare July 7, 2025 07:31

github-advanced-security bot found potential problems Jul 7, 2025

View reviewed changes

internal/extensionserver/post_cluster_modify.go Fixed Show fixed Hide fixed

Xunzhuo force-pushed the feat-epp-integration branch 6 times, most recently from 5684f11 to d9db2fe Compare July 7, 2025 12:51

Xunzhuo force-pushed the feat-epp-integration branch 8 times, most recently from a807100 to 994dff7 Compare July 14, 2025 12:45

Xunzhuo mentioned this pull request Jul 14, 2025

feat: support passing backend context on listener/vhost level envoyproxy/gateway#6518

Closed

Xunzhuo force-pushed the feat-epp-integration branch from 994dff7 to 077f519 Compare July 14, 2025 13:00

mathetake reviewed Jul 14, 2025

View reviewed changes

data.json Outdated Show resolved Hide resolved

mathetake mentioned this pull request Jul 14, 2025

site: organize blog section and add ref arch blog post #890

Merged

Xunzhuo mentioned this pull request Jul 15, 2025

Support Listeners/Routes at PostTranslateModifyHook envoyproxy/gateway#6523

Closed

Xunzhuo force-pushed the feat-epp-integration branch 3 times, most recently from 5e319d4 to 6ea43c8 Compare July 15, 2025 03:46

Xunzhuo force-pushed the feat-epp-integration branch 2 times, most recently from e0684ba to 1c83747 Compare July 16, 2025 03:48

Xunzhuo added this to the v0.3.0 milestone Jul 16, 2025

Xunzhuo force-pushed the feat-epp-integration branch from 1c83747 to 03573fd Compare July 16, 2025 06:23

mathetake mentioned this pull request Jul 16, 2025

docs: update epp design logics #895

Open

Xunzhuo force-pushed the feat-epp-integration branch 4 times, most recently from a483876 to bfc78ab Compare July 18, 2025 07:35

feat: add support for inferencePool

537087e

Signed-off-by: bitliu <[email protected]>

Xunzhuo force-pushed the feat-epp-integration branch from bfc78ab to 537087e Compare July 18, 2025 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add support for InferencePool #823

feat: add support for InferencePool #823

Xunzhuo commented Jul 4, 2025 •

edited

Loading

Uh oh!

Xunzhuo commented Jul 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mathetake commented Jul 4, 2025 •

edited

Loading

Uh oh!

yuzisun commented Jul 4, 2025

Uh oh!

Xunzhuo commented Jul 5, 2025

Uh oh!

mathetake commented Jul 6, 2025

Uh oh!

Uh oh!

Uh oh!

Xunzhuo commented Jul 15, 2025

Uh oh!

mathetake commented Jul 15, 2025

Uh oh!

Xunzhuo commented Jul 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: add support for InferencePool #823

Are you sure you want to change the base?

feat: add support for InferencePool #823

Conversation

Xunzhuo commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xunzhuo commented Jul 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mathetake commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuzisun commented Jul 4, 2025

Uh oh!

Xunzhuo commented Jul 5, 2025

Uh oh!

mathetake commented Jul 6, 2025

Uh oh!

Uh oh!

Uh oh!

Xunzhuo commented Jul 15, 2025

Uh oh!

mathetake commented Jul 15, 2025

Uh oh!

Xunzhuo commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Xunzhuo commented Jul 4, 2025 •

edited

Loading

mathetake commented Jul 4, 2025 •

edited

Loading

Xunzhuo commented Jul 16, 2025 •

edited

Loading