Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Discussion] The ways to extend/custom functionalities for Envoy: WASM vs. Lua vs. External Processing vs. GO filter #5123

Closed
wilsonwu opened this issue Feb 27, 2023 · 24 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@wilsonwu
Copy link
Member

wilsonwu commented Feb 27, 2023

This is a summarize and discussion of the ways to extend or custom Envoy functionalities, let's compare below 4 methods, we don't talk about details of the implementation, just compare and choose which we want to make Contour to support.

Comparison:

WASM:

ref. #4276
Security: Medium
For Envoy, it provide sandbox to make WASM run in an isolation area, the Envoy main process will not be affected.

Extendibility: High
Users can implement any logic in WASM and set the WASM run in any stage of Envoy.

Performance: Low
The WASM performance is not good, from an Istio expert, the WASM only has 50% performance of Envoy native logic.

Others: WASM will increase the NACKs, because load WASM depends on WASM file and from configuration to load file, there is more chances to NACKs.

Lua:

ref. #3006
Security: Low
If user write a bad script, Envoy will be affected.

Extendibility: High
As same as WASM user can create any logic by Lua.

Performance: Medium / Low
From some Istio usage experience, for performance Lua is better then WASM, and from gateway use case, the performance still not good.

External Processing

ref. #5038
Security: High
Like ExAuth and Traffic Ratelimit, use external processing to extend envoy functionalities is a good way, because there is no logic code running in Envoy, just get response from a external service (like auth server), so if the external service has problem, for Envoy, it works well.

Extendibility: Low
For External Processing, must design by the standard, if user want to do anything not related the rules, maybe that can't work.

Performance: High
The external service can deploy more replica and assign more resources to improve performance.

GO filter

Security: Low
As my understanding the filter of envoy be extended will inject envoy process, if there is any problem, it will affect envoy.

Extendibility: High
As same as WASM and Lua, we can code anything we want.

Performance: High
For GOLANG the performance is higher than WASM and Lua, but still need to test.

@wilsonwu wilsonwu added kind/feature Categorizes issue or PR as related to a new feature. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. labels Feb 27, 2023
@github-actions
Copy link

Hey @wilsonwu! Thanks for opening your first issue. We appreciate your contribution and welcome you to our community! We are glad to have you here and to have your input on Contour. You can also join us on our mailing list and in our channel in the Kubernetes Slack Workspace

@wilsonwu
Copy link
Member Author

Vote from me:
WASM +1
External Processing +1

@clayton-gonsalves
Copy link
Contributor

Thanks for adding this, a few comments:

Adding the design and implementation efforts for each of these would also be nice.

For example,

  • WASM requires significant research and handling of the NACKs before we can move forward.
  • ExtProc has similar patterns with rate limiting and ExtAuthz. The design effort won't be as much.
  • The Go filter is still in alpha and under development.

Also, for external processing performance, we should consider the added latency the network call adds.

@davinci26
Copy link
Contributor

Also, for external processing performance, we should consider the added latency the network call adds.

We I am suprised that something involving an external call is marked higher on performance compared to in process solutions.

That being said I think ext_proc is the most straightforward way to add envoy extensibility given that we already have the design patterns for it.

The comparison between WASM and Go is harder for me. Both filters have limited use cases and are not "stable". I wonder if we can take the approach of going forward with ext_proc filter and then graduate to WASM/Go assuming that we have more datapoints from the Envoy community and also better understanding of the latency guarantees that we need. If people adopt it and find it to be slow we can start designing and thinking about in process methods.

@skriss / @sunjayBhatia what are we looking for to make a decision here?

@wilsonwu assuming we knew which way we decided to go forward with, would you be volunteering for the implementation? Is this a high priority item for you/your team?

@wilsonwu
Copy link
Member Author

Also, for external processing performance, we should consider the added latency the network call adds.

We I am suprised that something involving an external call is marked higher on performance compared to in process solutions.

That being said I think ext_proc is the most straightforward way to add envoy extensibility given that we already have the design patterns for it.

The comparison between WASM and Go is harder for me. Both filters have limited use cases and are not "stable". I wonder if we can take the approach of going forward with ext_proc filter and then graduate to WASM/Go assuming that we have more datapoints from the Envoy community and also better understanding of the latency guarantees that we need. If people adopt it and find it to be slow we can start designing and thinking about in process methods.

@skriss / @sunjayBhatia what are we looking for to make a decision here?

@wilsonwu assuming we knew which way we decided to go forward with, would you be volunteering for the implementation? Is this a high priority item for you/your team?

Yes, currently the ext_proc is a better way as Contour extensions, although it still has extensibility limitation, my team and I would like to contribute this, if we have decision making on this.

@skriss
Copy link
Member

skriss commented Apr 12, 2023

I'm +1 to all of the above comments re: external processing (well-established design pattern, avoids thorny NACK issues, etc). My biggest concern there is that the filter is not yet stable:

This API feature is currently work-in-progress. API features marked as work-in-progress are not considered stable, are not covered by the threat model, are not supported by the security team, and are subject to breaking changes. Do not use this feature without understanding each of the previous points.

(ref. External Processor docs)

Does anyone have more information on the timeline for the filter moving to a more stable state?

I would think at a minimum, we could start to make progress on design, including thinking through interaction with other existing features and any security considerations, and a spike on functionality. We could also consider whether this is something that we could release in an experimental state behind a feature flag, until the upstream functionality is more stabilized.

If folks are attending KubeCon EU, please come to the Contour ContribFest - would love to have more discussions there!

@davinci26
Copy link
Contributor

Does anyone have more information on the timeline for the filter moving to a more stable state?

I will reach out in the Envoy slack channel and talk with the maintainers there to get more info

@sunjayBhatia
Copy link
Member

+1 for exploring the ext prox filter as there seems to be a bit more in the way of guardrails and the operational mechanisms seems simpler

@davinci26
Copy link
Contributor

Does anyone have more information on the timeline for the filter moving to a more stable state?

Talking with the current codeowner:

Yeah, the ext_proc API is fairly stable. The implementation is currently in alpha state. More fuzzer work need to be done before it can be changed into stable state.

@wilsonwu
Copy link
Member Author

Yeah, the ext_proc API is fairly stable. The implementation is currently in alpha state. More fuzzer work need to be done before it can be changed into stable state.

Thanks for this info, and if we build the extension feature in Contour based on current version of ext_proc, do you think there is any risk?

@github-actions
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2023
@github-actions
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 14, 2023
@wilsonwu
Copy link
Member Author

In Envoy v1.27.0, the unstable warning has been removed: https://www.envoyproxy.io/docs/envoy/v1.27.0/api-v3/extensions/filters/http/ext_proc/v3/ext_proc.proto

I think we can move on for this.

@wilsonwu wilsonwu reopened this Jul 31, 2023
@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 1, 2023
@stevesloka
Copy link
Member

I have some needs to utilize something like this as well. I'd like to help jump start some additional work on this as well. @wilsonwu are you still up for working on some of this? I'm happy to jump in as well, I've been way to quiet on this project and this seems like a (selfishly) useful place I can also help out.

@skriss @sunjayBhatia any hesitation on that approach? Or allow a feature-gate to enable like we've done in the past?

@skriss
Copy link
Member

skriss commented Aug 22, 2023

Thanks @wilsonwu for that update, that's great to see.

Hey @stevesloka! Personally I'm happy to move ahead with ExtProc support along the lines of how we have ExtAuthz and Global Rate Limiting implemented, given that it seems to be pretty stable on the Envoy side and we have a well-defined pattern for integrating these auxiliary services. Seems like most interested parties would be happy with that as a step forward as well. If you are willing to drive it, that would definitely help get it done sooner 😀

@wilsonwu
Copy link
Member Author

Thanks @stevesloka and @skriss , happy to hear that, after internel talk, @izturn and I will keep on contributing on this feature, I think in next month we will have a draft design for this, let's keep eyes on it.

@izturn izturn self-assigned this Sep 22, 2023
@izturn
Copy link
Member

izturn commented Oct 13, 2023

@stevesloka @skriss @sunjayBhatia and more, i will put a draft design & implementation next week

@izturn
Copy link
Member

izturn commented Oct 19, 2023

@stevesloka @skriss @sunjayBhatia PTAL

@stevesloka
Copy link
Member

@izturn is there a link or branch to look at? Thanks!

@izturn
Copy link
Member

izturn commented Oct 25, 2023

@stevesloka #5866 #5867 #5868

@SamMHD
Copy link
Contributor

SamMHD commented Dec 5, 2023

Is there any plan to review these changes? My team will really appreciate this feature for our billing purposes in Contour.
Also we can help dear @izturn if needed.

Copy link

github-actions bot commented Feb 8, 2024

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2024
@izturn izturn removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2024
Copy link

github-actions bot commented May 6, 2024

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 6, 2024
Copy link

github-actions bot commented Jun 6, 2024

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/needs-triage Indicates that an issue needs to be triaged by a project contributor. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

8 participants