-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discuss] Feast Enhancements #6
Comments
Thanks for kicking this issue off @aaraujo-sfdc!
In an attempt to play devils advocate: Would it be possible (even simpler) to emit events (maybe using open tracing) from Feast Core whenever there is a state change?
I've not thought deeply about this, but would this be specific to your use case? I don't really see other folks needing functionality. I guess the question just becomes how this can be implemented to serve your use case only.
We call this a PushAPI at Tecton. We are trying to be more specific about our data contracts, which should help make it easy to add new components.
Seems reasonable. FYI: Long term we are planning to slowly move towards Go for Serving. We probably won't pick that work up in the next couple of months, but we think we can build a simpler and easier to maintain serving API. I raise this because we have two options here. The pragmatic approach is to extend the Serving code base and reuse storage connectors there (or SPI implementations). Another approach is to build the Delete/Push API as a Go service and phase over functionality over time. Happy to hear your thoughts. |
Thanks for looking over this @woop, it's quite a lot to digest.
Likely possible, but not sure if this simplifies things for us. A hook we could implement within Feast Core would allow us to synchronously notify existing systems using their APIs without introducing additional components or re-working those systems. Since it would be synchronous, we also don't have to implement state management to make sure external systems successfully processed these events. End users would immediately know that the overall system could not process their request.
Reader and writer, yes. Initially we'd do both through serving (via the Get/Push APIs). Writing through serving initially would allow us to support pipeline ingestion and direct user writes with minimal work. Down the road we'd likely look into bypassing serving for pipeline ingestion by writing directly from Spark -> HBase. Phrased differently, initially we only need SPIs for the storage operations in Feast Serving. Later we'd likely want to do similar work for Feast ingestion (historical and streaming).
I would think storing data for multiple tenants is fairly common. An entity in Feast could belong to different tenants, and a TenantID would disambiguate between one tenant's EntityID=123 vs another's. Implementing multi-tenancy would require support for "multi-tenant" feature tables (specifying a
Push/Write/etc. sounds good. I was using the existing convention, which as you pointed out, makes more sense for schemas/specs.
Is that a hard data contract or more of a storage specification for the Redis implementation? Our storage implementation would take the API proto types and map them to SQL types (we use Apache Phoenix on top of HBase). This would allow us to view/audit data with standard SQL tools. We would also have one HBase table for each (project, table_name) tuple as opposed to one table with a (project, entity_name, table_name, feature_name) hierarchy.
Interesting. Same API delivered as a Go service instead of Java? Our users are strictly against doing any sort of service migrations for their apps so it would have to be a simple drop in replacement for us.
How would you reuse Java storage connectors (or SPI implementations) in Go? It seems we'd need to reimplement them.
Wouldn't that mean running two Serving services + routing calls they each support until the Go service has everything? |
I think the system can still be synchronous. What I am trying to optimize for is purely maintainability. We have limited experience with SPI so we'd be relying on your experience in implementing it.
Do you think there is a way to leverage labels for the TenantID without requiring a TenantSpec? It seems like the only thing that would need to be added here is a way to affect storage through the feature table specification.
Both a storage specification for the Redis implementation as well as a specification for general K/V storage. it would not cover RDBMS storage.
Correct, drop in.
I meant either or. Either take the SPI route or start with Go. If we start with the SPI route then moving to Go would require a reimplementation.
I actually think there is a strong case for deploying these services separately in any case (even if they share a code base). The life cycle is different for delete, write, and read and I think in most cases the capacity required would be different as well. I also think its easier to reason about a single writer to a store than multiple. My intuition is to have one deployment for services that mutate state (update, insert, delete) and one for reading. |
If we define a generic interface for this and the default implementation is a no-op, the maintenance overhead should be minimal to none. Probably good to carve this out into a separate issue where we can propose a design and continue there.
We might be able to define the feature table specification using labels, but since a feature table would store data for all tenants the
In that case I think it needs to be reworked a bit to separate the "applies to all K/V storage" portions from the "applies only to Redis" portions. For example, this is specific to Redis:
Our online provider implementation would be built using Apache Phoenix, which is closer to an RDBMS API than a K/V store API. So perhaps a higher level specification would be needed to cover both.
Given our timeline + Feast's timeline for the Go API we'd need to take the SPI route to begin with.
Sounds like we have options for the new APIs:
They all seem reasonable, but given that Java serving/ingestion will be deprecated at some point, we'd prefer an option that does not require an additional Java micro-service. Thoughts? |
Sounds good.
Yea, I am not 100% sure what that ingest/serving API would look like to be honest. If you could create a basic sketch of what that would look like it would be great.
Yes, that's correct.
Fair enough.
It seems like the most pragmatic course is then the one originally laid out by you, which is to evolve the Serving service for the functionality you laid out. This is probably the least amount of work. From a purely selfish perspective I'd want us to build out this functionality in a Go service, because I do see us needing that pretty soon (but probably not for 2-3 months). We'll need both the reading and writing functionality, but I am not sure about deletion yet. I understand the reasons not to take this course though. |
My team has been evaluating Feast for adoption and have identified a few improvements we'd like to contribute for our use cases. @woop suggested starting a thread for discussion to make sure they are a good fit.
Support for SPI extensions
We'd like to add extension points for our existing infrastructure for things like:
Multi-tenancy support
Our infrastructure is entirely multi-tenant. We authorize API calls and store data on a per-tenant basis.
On the surface, we'd need something like the following in Feast:
TenantSpec tenant
attribute inFeatureTableSpec
that defines the name + type of a tenant field for feature table keys (e.g., name = 'TenantId'; type = 'string')tenant
attribute to ingestion and serving APIs (+SDK methods)Direct online write support
We have use cases that would like to write to feature store directly. Adding an
ApplyOnlineFeatures
API (+SDK support) would satisfy these use cases. The API would semantically resemble theGetOnlineFeaturesV2
API (write feature table values for specific entity rows).Delete online feature support
We maintain GDPR compliance by propagating delete record signals from upstream data systems. Essentially we'd need a
DeleteOnlineFeatures
API (+SDK support) that resemblesGetOnlineFeaturesV2
andApplyOnlineFeatures
(delete feature table values for specific entity rows).If these seem like reasonable enhancements to Feast we can file individual issues and PRs to contribute these incrementally.
The text was updated successfully, but these errors were encountered: