feat: Add OnlineStore for MongoDB#6025
Open
caseyclements wants to merge 29 commits intofeast-dev:masterfrom
Open
feat: Add OnlineStore for MongoDB#6025caseyclements wants to merge 29 commits intofeast-dev:masterfrom
caseyclements wants to merge 29 commits intofeast-dev:masterfrom
Conversation
Signed-off-by: Casey Clements <[email protected]>
… pymongo is found as extra Signed-off-by: Casey Clements <[email protected]>
…lineWrites with default args of MongoDBOnlineStoreConfig. Lots to do. Signed-off-by: Casey Clements <[email protected]>
…ule pass Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
…r now. simply and transformming Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
…uming one is running Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
…odb server. Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
…eStore Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
… to proto, removed the naive one. It was outcompeted 3X across dimensions Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
Signed-off-by: Casey Clements <[email protected]>
5a815be to
6581159
Compare
Collaborator
|
@caseyclements please check the errors in the tests :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Adds a first-class MongoDB online store integration at
feast.infra.online_stores.mongodb_online_store.Schema
Each entity is stored as a single MongoDB document keyed by its serialized entity key:
{ "_id": "<serialized_entity_key>", "features": { "<feature_view>": { "<feature>": <value> } }, "event_timestamps": { "<feature_view>": "<datetime>" }, "created_timestamp": "<datetime>" }Because MongoDB has a loose schema and supports upsert semantics natively,
update()requires nopre-creation — it only removes feature views named in
tables_to_deletevia$unset. Multiplefeature views for the same entity share one document.
Implementation highlights
online_write_batch,online_read,online_write_batch_async,online_read_async, andasync def close().async_supportedreturnsread=True, write=True.requested_featuresprojection — both read paths build a MongoDB field projection so onlythe requested feature columns are returned from the server.
_convert_raw_docs_to_prototransforms column-wise tominimise calls to
python_values_to_proto_values(one call per feature across all entities,rather than one call per entity × feature). Benchmarking confirmed this is ~3–4× faster than a
naïve row-wise approach across all scaling dimensions (entities, features, feature views).
serialize_entity_key, so compositekeys (e.g.
customer_id + driver_id) are handled without schema changes.Tests
test_mongodb_online_features) — spins up a real MongoDB containervia
testcontainers, writes to three feature views (single int key, single string key, compositekey), and asserts correct retrieval including type coercion and missing-entity handling.
test_convert_raw_docs_missing_entity— entity absent from query results →(None, None)test_convert_raw_docs_partial_doc— entity present but a feature key missing → emptyValueProto(schema-migration safety)test_convert_raw_docs_ordering— result order follows the requestedidslist regardlessof MongoDB cursor order
All tests pass. Code is clean under
mypy,ruff check, andruff format.