Skip to content

Conversation

@BorysTheDev
Copy link
Contributor

@BorysTheDev BorysTheDev commented Jan 26, 2026

Part of HNSW index replication task: added save/load for hash/ json globalId mapping

Copilot AI review requested due to automatic review settings January 26, 2026 15:53
@BorysTheDev BorysTheDev changed the title feat: global_id mapping serialization feat: global_id mapping serialization NOT READY FOR REVIEW Jan 26, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds serialization and deserialization support for search index global_id mappings in the RDB format. The feature enables replication of vector search indices by preserving document ID mappings across master-replica synchronization.

Changes:

  • Introduces RDB_OPCODE_GLOBAL_ID (221) to store index_name and global_id pairs before key entries
  • Adds serialization logic in SliceSnapshot::SerializeEntry to save global_ids for HASH/JSON keys indexed by search
  • Implements deserialization in RdbLoader to restore global_id mappings on the replica
  • Adds infrastructure methods SetMasterDocId, GetMasterDocId, and ClearMasterMappings to ShardDocIndex

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/server/rdb_extensions.h Defines new RDB opcode RDB_OPCODE_GLOBAL_ID (221) for search index global_id storage
src/server/rdb_save.h Declares SaveGlobalId method for serializing global_id entries
src/server/rdb_save.cc Implements SaveGlobalId to write opcode, index name, and 8-byte global_id
src/server/rdb_load.h Adds global_ids vector to Item struct for storing loaded mappings
src/server/rdb_load.cc Parses RDB_OPCODE_GLOBAL_ID and stores mappings; transfers them to search indices
src/server/snapshot.cc Serializes global_ids for indexed HASH/JSON keys during snapshot creation
src/server/search/doc_index.h Adds master_doc_ids_ map and related methods to ShardDocIndex
src/server/search/doc_index.cc Implements ForEachGlobalDocId callback iterator and master mapping methods

@augmentcode
Copy link

augmentcode bot commented Jan 26, 2026

🤖 Augment PR Summary

Summary: Adds RDB-level serialization for search index global_id mappings so replicas can restore master document id relationships.

Changes:

  • Introduced a new DF RDB opcode RDB_OPCODE_GLOBAL_ID (221) to store (index_name, global_id) pairs before a key entry
  • Extended RDB load/save paths to emit and parse these global-id records and attach them to loaded items
  • During snapshot serialization, emits global-id records for HASH/JSON keys that are indexed by search
  • Added search-side helpers to iterate per-key global ids and to store “master doc id” mappings during RDB load

Technical Notes: Global ids are stored as little-endian uint64_t and may appear multiple times per key (one per matching index).

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Copilot AI review requested due to automatic review settings January 27, 2026 09:21
@BorysTheDev BorysTheDev changed the title feat: global_id mapping serialization NOT READY FOR REVIEW feat: global_id mapping serialization Jan 27, 2026
@BorysTheDev BorysTheDev requested a review from dranikpg January 27, 2026 09:22
@BorysTheDev BorysTheDev requested a review from mkaruza January 27, 2026 09:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Comment on lines +2736 to +2743
// Store master doc_id mappings for search indices
if (!item->global_ids.empty()) {
if (auto* search_indices = db_slice->shard_owner()->search_indices(); search_indices) {
for (const auto& [index_name, global_id] : item->global_ids) {
search_indices->SetMasterDocId(index_name, item->key, global_id);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the indices haven't been created yet. They're loaded from an aux field on shard 0 snapshot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next PR I will create index automatically if we have such fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it is better to add some proper utilities for loading indices rather than abusing the indices themself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't get your point

uint32_t mc_flags, DbIndex dbid);

// Write a single global_id entry for search-indexed keys.
// Format: RDB_OPCODE_GLOBAL_ID + index_name (string) + global_id (8 bytes).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imagine how wasteful it's to send the index name each time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is why I wanted to have global_id for all indexes

Comment on lines +2082 to +2089
if (type == RDB_OPCODE_GLOBAL_ID) {
/* GLOBAL_ID: search index global document id (index_name + global_id) */
string index_name;
SET_OR_RETURN(FetchGenericString(), index_name);
uint64_t global_id;
SET_OR_RETURN(FetchInt<uint64_t>(), global_id);
settings.global_ids.emplace_back(std::move(index_name), global_id);
continue; /* Read next opcode. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand it from a consistency point of view. If we plan to support writes in the future we have to plan ahead. I'll message in a private group

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to implement the same mechanism as we use for replication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants