-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: global_id mapping serialization #6475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds serialization and deserialization support for search index global_id mappings in the RDB format. The feature enables replication of vector search indices by preserving document ID mappings across master-replica synchronization.
Changes:
- Introduces
RDB_OPCODE_GLOBAL_ID(221) to store index_name and global_id pairs before key entries - Adds serialization logic in
SliceSnapshot::SerializeEntryto save global_ids for HASH/JSON keys indexed by search - Implements deserialization in
RdbLoaderto restore global_id mappings on the replica - Adds infrastructure methods
SetMasterDocId,GetMasterDocId, andClearMasterMappingstoShardDocIndex
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/server/rdb_extensions.h | Defines new RDB opcode RDB_OPCODE_GLOBAL_ID (221) for search index global_id storage |
| src/server/rdb_save.h | Declares SaveGlobalId method for serializing global_id entries |
| src/server/rdb_save.cc | Implements SaveGlobalId to write opcode, index name, and 8-byte global_id |
| src/server/rdb_load.h | Adds global_ids vector to Item struct for storing loaded mappings |
| src/server/rdb_load.cc | Parses RDB_OPCODE_GLOBAL_ID and stores mappings; transfers them to search indices |
| src/server/snapshot.cc | Serializes global_ids for indexed HASH/JSON keys during snapshot creation |
| src/server/search/doc_index.h | Adds master_doc_ids_ map and related methods to ShardDocIndex |
| src/server/search/doc_index.cc | Implements ForEachGlobalDocId callback iterator and master mapping methods |
🤖 Augment PR SummarySummary: Adds RDB-level serialization for search index Changes:
Technical Notes: Global ids are stored as little-endian 🤖 Was this summary useful? React with 👍 or 👎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
| // Store master doc_id mappings for search indices | ||
| if (!item->global_ids.empty()) { | ||
| if (auto* search_indices = db_slice->shard_owner()->search_indices(); search_indices) { | ||
| for (const auto& [index_name, global_id] : item->global_ids) { | ||
| search_indices->SetMasterDocId(index_name, item->key, global_id); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the indices haven't been created yet. They're loaded from an aux field on shard 0 snapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next PR I will create index automatically if we have such fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it is better to add some proper utilities for loading indices rather than abusing the indices themself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't get your point
| uint32_t mc_flags, DbIndex dbid); | ||
|
|
||
| // Write a single global_id entry for search-indexed keys. | ||
| // Format: RDB_OPCODE_GLOBAL_ID + index_name (string) + global_id (8 bytes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imagine how wasteful it's to send the index name each time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is why I wanted to have global_id for all indexes
| if (type == RDB_OPCODE_GLOBAL_ID) { | ||
| /* GLOBAL_ID: search index global document id (index_name + global_id) */ | ||
| string index_name; | ||
| SET_OR_RETURN(FetchGenericString(), index_name); | ||
| uint64_t global_id; | ||
| SET_OR_RETURN(FetchInt<uint64_t>(), global_id); | ||
| settings.global_ids.emplace_back(std::move(index_name), global_id); | ||
| continue; /* Read next opcode. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand it from a consistency point of view. If we plan to support writes in the future we have to plan ahead. I'll message in a private group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to implement the same mechanism as we use for replication.
Part of HNSW index replication task: added save/load for hash/ json globalId mapping