Conversation
✅ Deploy Preview for polite-licorice-3db33c ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Summary of ChangesHello @aepfli, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive Architecture Decision Record (ADR) detailing the strategic move towards a unified flag evaluation engine within the OpenFeature ecosystem. The core objective is to consolidate the currently fragmented, language-specific evaluation implementations into a single, robust solution. By leveraging Rust for its performance and cross-compilation capabilities (WASM and native bindings), the proposal aims to significantly reduce maintenance burden, ensure consistent behavioral outcomes across all SDKs, and accelerate feature development. The ADR meticulously outlines the scope, technical considerations, and a phased adoption plan, emphasizing a gradual, opt-in migration to ensure stability and community validation. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new Architecture Decision Record (ADR) for a unified flag evaluator. The document thoroughly outlines the problem statement, requirements, considered options, and a detailed proposal for a Rust-based evaluator with multiple compilation targets (WASM, native bindings). It also covers performance considerations, migration strategies, and potential consequences.
My review focused on clarity, consistency, and accuracy within the document. I've identified several minor typos and formatting inconsistencies, particularly in technical terms and code examples, which could improve the document's professionalism and readability. Additionally, some strong claims have been softened to reflect more realistic expectations. Overall, this is a well-structured and comprehensive ADR, and addressing these minor points will further enhance its quality.
|
|
||
| **Pros**: | ||
|
|
||
| - Native performance |
There was a problem hiding this comment.
Generated code might not always be as efficient and fast as it could be with real native code
| |----------|------------------|---------| | ||
| | Simple flags (no targeting) | ~3-4µs baseline | Negligible | | ||
| | Moderate targeting | 3-5x slower | Acceptable for most use cases | | ||
| | Complex targeting (large context) | 4-6x slower | Serialization dominates | |
There was a problem hiding this comment.
I think we might be able to reduce the serialization overhead with some caching inside the WASM module
|
Could you share the prompts that you used to generate this doc instead? I would suggest having a proposal to support bring-your-own JSONLogic engine / evaluator. Having a mandatory WASM-based evaluation engine is NOT an option for our 1P use cases. We'll have to drop flagd or having a significantly different branch of flagd internally if we will move forward with this proposal. |
|
my prompt was based on: #1842 and i tweaked it a little with copilot to be more generic, and not too much implementation focused. So i do not think that this will be helpful. This is an exploration with native bindings, and wasm only where wasm is needed. eg. java is precompiled to java code with chicory, python uses py03 to generate bindings. It is more like a show case to trigger the discussion and reduce the efforts. |
| - JsonLogic evaluation | ||
| - Targeting rule processing | ||
| - Fractional evaluation algorithms | ||
| - Default variant resolution |
There was a problem hiding this comment.
There is perhaps the case to be made that some evaluation logic may be left to the SDK, for instance the default variant resolution when there is no targeting logic.
It does contradict some of the core principles layed out below, but this would allow for language specific performance optimizations for specific cases and ease migration cost.
This relates to one big application we maintain where some code paths can sometimes evaluate many flags and paying the cost of ffi overhead is not an option.
The document does mention bulk evaluation, although not in the target API. Perhaps this could be a possible mitigation to include it in this proposal, and consider it as an evolution for the open feature specification?
There was a problem hiding this comment.
We are actually also tinkering and thinking of returning a set of ff which are pre evaluated due to the fact that they not have a targeting rule. So when populating the store, we return already those prepopulated once. And do not need to jump the barrier again. But this is some kind of additional optimization m, which I think is not needed for the initial draft.
open-feature-forking/flagd-evaluator#60
//EDIT: adapted for clarity, and removed wrong term
@tangenti can you share why? Is it something fundamental to WASM or just the performance / serialization latency? Personally my main goal here is to write the evaluation-engine once instead of building the same thing 6-7 times. I think this is something you've advocated for in the past as well. With Rust, we can also produce native bin targets to which we could interface with using things like JNI (in Java) to make native C calls. This avoids the serialization latency, though it creates some portability challenges. |
@tangenti I can understand that this might not fit your use case, and your suggested approach sounds interesting. |
Google does not allow WASM sandboxes/engines for most of the languages. This is a hard NO that we have no influence to change. I'm fully supportive for a unified evaluation engine, and from what I understand from Simon's proposal and the discussions, having a mandatory WASM based engine is not necessary here - native bindings from the Rust is one approach, and bring-your-own evaluator is another option we could explore here. |
I am not sure if bring-your-own evaluator needs to be part of this. This ADR focuses on reducing the efforts for the maintainers. We still should support a default evaluator, and the current state is less than ideal. Slightly different behaviour, slightly different jsonlogic evaluation. This ADR focuses on unifying this, and I do not think we should add more to it. The discussion around a byoe (bring your own evaluator) might influence this ADR, but is not the solution to the problem I want to tackle with this ADR. Simplifying the efforts for maintaining the flagd provider, making it easy to role out changes to all of the sdks with one basic change in one repository. The byoe focuses on flexibility - but instead of reducing the maintance efforts it increases them, hence I would propose an own ADR for that. |
|
DevCycle was heavily aligned to our WASM core; and we found it was generally very handy/helpful but we had some challenges from doing it the way we did. Maintaining and building this core was one of the things I was responsible for along with @jonathannorris.
Overall - we've become less and less enthused with a WASM core for the following reasons:
As a result - I heavily investigated using a rust core instead of our assemblyscript base. I initially started scoping this trying to use WASM as a single output from a rust core library; using as many of the new Rust/rust-bindgen WASM features as possible; such as using multiple memories to ideally parallelize the access to the core based on the frequency of access. I've made my investigation doc public here: https://www.notion.so/taplytics/Core-Bucketing-ReWrite-Scoping-278931ab956180c7b65efe292df9d4f5 - but my TL;DR of this is that I would suggest instead of a WASM core; using a C library as the default target - and selecting WASM for languages that would be a better fit (ie NodeJS due to its native support, and the fact the language itself is still "single threaded"). One key thing to note - in my opinion - the difficulty of implementing WASM in a given language is very comparable to the difficulty of implementing it via C Bindings. .NET C/FFI: The rust core is open source here: https://github.com/devcyclehq/bucketing-rs - including some of the example implementations. TL;DR - Rust core is a great idea, outputting to WASM as a default would be an option but you will have a significant performance overhead when running in concurrent workloads. |
I share the same sentiment with Jamie, and I'm going to play devil's advocate a bit because I have doubts whether WASM will work well at a scale, due to tradeoff of single thread limitation. I have been thinking and researching about this for a while, and I personally think the priority should be on providing an FFI support. For Java for example, FFM API provides improvements to call bindings. Technically speaking, you can extend the existing Rust flagd provider and extend it to other languages through ffi bindings (I tested this for Perl with sync mode, I can confirm it works). Also I would expect FFI has less overhead compared to WASM. A feature flag client would be the last thing I'd expect to be a bottleneck of a system. |
|
@erenatas As a note. WASM is one library, FFI is few libraries (OS * arch == 10?) |
Yes you are right, afaik GitHub actions provide runners for distribution, correct me if I'm wrong. |
|
And eg. Java we are still supporting 11. In java we have chicory, which allows for build time compilation. Which reduces the costs. The benefit of wasn't/chicory is that we are staying within the JVM boundaries for security. |
Jni-rs uses invocation API. I understand the convenience wasm provides as interpreters make it agnostic, but for sync mode being single threaded sounds like an important limitation, considering potential consumers of such feature flag client. We have cases that will serve 100k+ rps for example which will use these clients. If you have the chance to validate this is not an issue, it would be very valuable in my opinion. And I want to state I see value in WASM for open-feature (also in providing feature flag capabilities for frontend). |
There are ways to build this for sure, but FFI has its own limitations. Some folks aren’t happy with the library size once all FFI assets are included, and FFI won’t run in a browser. WASM isn’t a silver bullet either - browser vs. non-browser environments and TLS add their own challenges. @erenatas |
Browser won't be an issue in this case since we recommended bulk evaluation on the edge to avoid shipping the ruleset to the browser. We could (and probably should) ensure the shared engine can work on edge functions/workers. I agree with the ARD's overall goal of aligning on a single, unified evaluation engine. Whether that's FFI, WASM, or something else may depend on the language we're integrating with. That's why Rust is such a compelling option to me. |
@erenatas - I found it heavily depends on your memory layout and how you're serializing and deserializing the data that matters heavily too. In some cases WASM/FFI had a very comparable performance simply due to my ineptitude (and laziness) to build out a rough hack for a test. I think there's something that should be heavily considered around the datamodel and serialization between. Something I've wanted to do as a test is to use CapNProto as the serialization handler for this (https://capnproto.org/) - as we saw benefits with the move from JSON -> Protobuf, I would imagine that we'd gain similarly for Protobuf -> CPNP, but that's just a hunch. I will heavily caveat that all my testing was done with Wasmtime as the client-language runtime across all languages simply because that's what I was working with from building out all the other systems. It also is as close to the WASM spec as you can get in my opinion.
I would personally never really consider the use of this kind of centralized implementation for a client-side library. Using something like OFREP with an edge worker (Cloudflare Workers, AWS Lambda, etc) has worked out extremely well for us with DevCycle - mostly because like you said, library size is critical for front-end deployments. Using this as the backend evaluator from some HTTP request could be entirely viable with something like CF Workers.
@aepfli - I'm interested in the concept of just implementing a standard interface across all the languages. In theory - this interface could allow for injection regardless of the backing engine. FFI would output a header file anyway, which is the primary consumption implement ideally allowing for swapping out the library (LGPL esque). I wouldn't specifically design for it though, and allow a user to implement the function interface on their own, very similar to how the OF provider spec works for FF vendors. |
Commenting on this separately since it's a bit more nuanced, and heavily matters based on what you prioritize. This is a bit of a story back into how we ended up with 3 implementations of our bucketing logic. One in WASM, one in Go, and another in JS (CF Workers mostly - but that's not relevant to this) I will be the first to admit that WASM is extremely limited in the performance you can squeeze from it when using it within a guest language. About a year or so after we built the WASM engine we had some issues with how WASM scaled in languages that are natively concurrent (Go specifically). Our first attempt was to mess around with how the WASM allocations were setup (and used it as a pool concept): DevCycleHQ/go-server-sdk#102 (this is a stacked PR with a few that you can click through) This kinda worked? It improved performance quite a bit; but still had some issues with memory performance and usage (as you now have your full alloc * number of pool members). We messed around with this quite a bit more and started to hit the point of diminishing returns - yet we still hadn't hit the performance goal that was required. This made our per variable evaluation timing drop from several hundreds of microseconds to 100-200 nanoseconds. A literal 1000x improvement. This also had the knock on effect of removing the CGO depend from our build process - allowing for a much cleaner and easier build for downstream consumers who can't/don't want to have CGO enabled. (Wasmtime required CGO to be enabled). Now - this isn't meant to dunk on WASM. Key note here - we only made this change for Go, and kept WASM in all our other language implementations for one major reason. That degree of performance wasn't needed, we rarely if ever heard complaints from customers about the performance of the WASM engine (ironically becuase of WASM, our Ruby SDK was faster than a natively implemented Ruby evaluation engine, same with Python). However - it was entirely single thread limited - simply due to the nature of the IO between the host memory and the guest memory (which is why most of our benchmarks are setup in two ways, serial, and parallel evaluations). TLDR: WASM can be setup and implemented in a way that makes it less single-threaded - but I would argue that the complexity of doing so is even higher than implementing an FFI properly and natively (this is from experience) where you benefit the efficiency of the evaluator flow actually being concurrent, not limited to |
|
I've made some small amendments in this PR to this branch. In summary:
@erenatas I agree with your takes here. @erka what are your thoughts, most specifically for Go? Do you think finding a flexible and performant target that will work with Go is realistic? Is there a way to leverage native code without CGO at all? I'm OK with maintaining a Go implementation alongside a Rust one, long term if necessary; it would still be an improvement over the current situation. @tangenti This proposal is not conclusive, but I'd like to agree on goals. I understand WASM is a "hard no" for you folks - is that true in all runtimes, including JS? Or does this rule mostly apply to Go artifacts? |
|
I haven’t followed the WASM world for a while, so take what I’m saying here with a grain of salt. Is it possible to run WASM in Go without CGO? Yes - Wazero is there, and I’m sure someone is already pushing an LLM to add WIT support as I write this. Performance-wise, it would likely be the slowest option compared to native code and FFI. One note about FFI, it introduces distribution challenges for Go: when building a Go (single) binary, OS- and architecture-specific FFI artifacts must be shipped (copied) alongside it, which some users find frustrating as it adds complexity to their pipeline. @toddbaert |
@erka it was actually the FFI that made me worry about CGO. We could hypothetically ship multiple artifacts for various architectures, but I think to use FFI we would generally have to enable CGO, and that would be transitively forcing it on our consumers, if I understand correctly. |
|
Usually, Go + FFI means CGO. There is purego project trying to break this but I've never tried it. |
|
My biggest reason i want to push this is actually not WASM nor FFI or anything like this. My big goal is to have one repository for the evaluator in all kind of languages, so that if we implement a new feature, we do it for all at once. ideally with the same version. This way we can at least determine what which sdk supports and in which kind of language. Starting with WASM makes the migration easy. but it does not limit us, to later improve on that, use FFI where it is really feasible, or go language native. The easiest starting point would still be wasm, to set this up as an experimental feature, activated via a flag during flagd initialization. This allows for experimentation and would lay the groundwork for us to a unified mono repo for the evaluator. A foundation to iterate and improve. Ideally handled via artifacts uploaded to the respective packet managers for the languages, versioned and ready to be updated with renovate etc. |
toddbaert
left a comment
There was a problem hiding this comment.
I'm approving, and leaving this in proposed state, since it doesn't have a conclusion or implementation tickets.
This ADR is, at this point, just a transparent statement of a goal, and not an implementation proposal.
I would like to merge this. Merging it does not necessarily mean any particular implementation has been decided on, just that some community members will be working and researching in this direction.
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Amending current ADR. Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
13ea4e8 to
6e433d8
Compare
…apython-reduced-maintenance-poc-and-reference
I'm not familiar with go landscape, is it possible to share any docs/articles that I can read to have a better context? 🙏 I also want to thank everyone for their efforts and input, its invaluable being able to share opinions freely. |
|
Re #1848 (comment): AFAIK, it's only allowed for the frontend usage of TypeScript. Re #1848 (review): Should we instead check in a shorter version of the ADR, which emphasize the agreed goal of unifying the evaluation engine, not to enforce the mandatory usage of WASM? If not, let's add a warning in the beginning of the doc, that this is not an approved design. |
I'll add something at the top of the doc. I still think the proposals/discussion are worth keeping, so I don't want to remove anything. I'll make it clear we haven't decided on a solution yet, and that we will seek to find alternatives (such as "Bring Your Own Evaluator") if we cannot come to consensus on targets/runtimes. |
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
|



This PR
This pull request introduces a comprehensive Architecture Decision Record (ADR) detailing the strategic move towards a unified flag evaluation engine within the OpenFeature ecosystem. The core objective is to consolidate the currently fragmented, language-specific evaluation implementations into a single, robust solution. By leveraging Rust for its performance and cross-compilation capabilities (WASM and native bindings), the proposal aims to significantly reduce maintenance burden, ensure consistent behavioral outcomes across all SDKs, and accelerate feature development. The ADR meticulously outlines the scope, technical considerations, and a phased adoption plan, emphasizing a gradual, opt-in migration to ensure stability and community validation.
Related Issues
Fixes #1842