fix(server): Async index builder, part 1 #6504

dranikpg · 2026-01-29T15:35:18Z

Relevant to #5752

Adds small IndexBuilder to be the base for async index construction operations
Use it synchronously to not have any functional changes

dranikpg · 2026-01-29T18:48:26Z

augment review

augmentcode · 2026-01-29T18:52:39Z

🤖 Augment PR Summary

Summary: Begins refactoring search indexing to support asynchronous index builds (per #5752) while preserving current behavior by running the new builder synchronously.

Changes:

Add new search::IndexBuilder that scans the DB table in a worker fiber and feeds docs into ShardDocIndex.
Wire ShardDocIndex::Rebuild to start the builder and wait for completion before finalizing initialization.
Add an out-of-line ShardDocIndex destructor to allow a forward-declared IndexBuilder member, and update CMake to compile the new source.
Prevent duplicate inserts into the shard index by short-circuiting AddDoc when the key is already present.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 4 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-01-29T18:52:40Z

src/server/search/index_builder.cc

+
+    MainLoopFb(prime_table.get(), op_args.db_cntx);
+
+    fiber_.Detach();  // Detach self to be safely deletable


Calling fiber_.Detach() from inside the running fiber can drop the last reference to the active FiberInterface while it’s executing (the fb2 implementation expects the final release/resume to happen from another fiber). This looks like it can lead to UB/crash during fiber teardown.

_{🤖 Was this useful? React with 👍 or 👎}

Its safe, a fiber keeps itself alive until its execution finishes. It can be detached from the very start and even from within itself

src/server/search/index_builder.cc

augmentcode · 2026-01-29T18:52:40Z

src/server/search/doc_index.cc

-  TraverseAllMatching(*base_, op_args, cb);
+  // This PR is limited to using the builder synchronously
+  while (builder_)
+    util::ThisFiber::SleepFor(100us);


This synchronous wait relies on on_complete always running to reset builder_; if the builder fiber exits early/fails before invoking it, Rebuild() can block indefinitely here.

_{🤖 Was this useful? React with 👍 or 👎}

Currently there's no way for the indexer to fail, it will be handled in the future

augmentcode

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

src/server/search/index_builder.cc

augmentcode · 2026-01-29T18:53:01Z

src/server/search/index_builder.cc

+
+    MainLoopFb(prime_table.get(), op_args.db_cntx);
+
+    fiber_.Detach();  // Detach self to be safely deletable


Calling on_complete() (which currently deletes the IndexBuilder via builder_.reset()) from inside the worker fiber is a fragile self-destruction pattern: any future access to members after this point would become UB. It may be safer if completion is signaled back to the caller fiber/thread rather than deleting the builder from within itself.

_{🤖 Was this useful? React with 👍 or 👎}

I agree its hacky but it's simple. No context really owns the index, but to join the final fiber we have to run it in a suspendable context (read fiber). With ProactorBase::me()->Dispatch({}) to create another fiber to stop a fiber?

augmentcode · 2026-01-29T18:53:01Z

src/server/search/doc_index.cc


-  TraverseAllMatching(*base_, op_args, cb);
+  // This PR is limited to using the builder synchronously
+  while (builder_)


Rebuild() now waits in a loop until builder_ is reset by the worker fiber; if the worker fiber exits early or on_complete is skipped, this becomes an infinite wait on the shard fiber. Consider adding a guard/failure path so rebuild can’t block forever on completion.

_{🤖 Was this useful? React with 👍 or 👎}

Copilot

Pull request overview

Introduces an IndexBuilder abstraction intended to support asynchronous index construction (per #5752), while keeping current behavior functionally synchronous.

Changes:

Added search::IndexBuilder (fiber-based) to traverse the shard table and add matching documents to an index, yielding periodically.
Wired ShardDocIndex::Rebuild() to use IndexBuilder and wait for completion (sync usage in this PR).
Added plumbing for ownership/forward-declarations (ShardDocIndex destructor + builder_ member) and updated search CMake target.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/server/search/index_builder.h	Declares new `IndexBuilder` helper and its fiber-based entrypoints.
src/server/search/index_builder.cc	Implements traversal loop with periodic yields and completion callback.
src/server/search/doc_index.cc	Switches `Rebuild()` to use `IndexBuilder` and adds a synchronous wait loop.
src/server/search/doc_index.h	Adds forward declaration + `builder_` member and explicit ctor/dtor.
src/server/search/doc_index_fallback.cc	Adds `ShardDocIndex` dtor in non-search build.
src/server/search/CMakeLists.txt	Adds `index_builder.cc` to `dfly_search_server` when `WITH_SEARCH` is enabled.

src/server/search/index_builder.cc

src/server/search/doc_index.cc

Copilot · 2026-01-29T18:58:55Z

src/server/search/doc_index_fallback.cc

+ShardDocIndex::~ShardDocIndex() {
+}


ShardDocIndex now owns std::unique_ptr<search::IndexBuilder> builder_, but this fallback TU defines ShardDocIndex::~ShardDocIndex() without including the full IndexBuilder definition. That will fail to compile because std::unique_ptr<T> requires T to be complete when the containing class destructor is instantiated. Include server/search/index_builder.h here, or conditionally exclude builder_ from ShardDocIndex when WITH_SEARCH is off.

src/server/search/index_builder.cc

src/server/search/index_builder.h

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

src/server/search/doc_index.cc

src/server/search/index_builder.cc

Copilot · 2026-01-29T21:18:12Z

src/server/search/index_builder.cc

+void IndexBuilder::MainLoopFb(dfly::DbTable* table, DbContext db_cntx) {
+  const auto doc_index = index_->GetInfo().base_index;
+
+  auto cb = [this, doc_index, db_cntx, scratch = std::string{}](PrimeTable::iterator it) mutable {
+    PrimeValue& pv = it->second;
+    std::string_view key = it->first.GetSlice(&scratch);
+
+    if (doc_index.Matches(key, pv.ObjType()))
+      index_->AddDoc(key, db_cntx, pv);
+  };
+
+  PrimeTable::Cursor cursor;
+  do {
+    cursor = table->prime.Traverse(cursor, cb);
+    if (base::CycleClock::ToUsec(util::ThisFiber::GetRunningTimeCycles()) > 500)
+      util::ThisFiber::Yield();
+  } while (cursor);


IndexBuilder::MainLoopFb duplicates the traversal/yielding logic that already exists as TraverseAllMatching in doc_index.cc. Having two implementations increases the risk they diverge (e.g., yield thresholds, match conditions). Consider factoring the traversal into a shared helper and reuse it here, keeping IndexBuilder focused on async orchestration.

yes, the goal is to replace it

fix(server): Async index builder, part 1

1232d3f

dranikpg marked this pull request as ready for review January 29, 2026 18:48

Copilot AI review requested due to automatic review settings January 29, 2026 18:48

Copilot started reviewing on behalf of dranikpg January 29, 2026 18:48 View session

augmentcode bot reviewed Jan 29, 2026

View reviewed changes

Copilot AI reviewed Jan 29, 2026

View reviewed changes

dranikpg added 2 commits January 30, 2026 00:00

fixes

fca2947

more fixes

9f2b8c5

Copilot AI review requested due to automatic review settings January 29, 2026 21:10

Copilot started reviewing on behalf of dranikpg January 29, 2026 21:10 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

small fixes

d1f073a

dranikpg requested review from BorysTheDev and mkaruza January 30, 2026 08:28


		MainLoopFb(prime_table.get(), op_args.db_cntx);

		fiber_.Detach(); // Detach self to be safely deletable

		ShardDocIndex::~ShardDocIndex() {
		}

fix(server): Async index builder, part 1 #6504

Are you sure you want to change the base?

fix(server): Async index builder, part 1 #6504

Conversation

dranikpg commented Jan 29, 2026

Uh oh!

dranikpg commented Jan 29, 2026

Uh oh!

augmentcode bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

dranikpg Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

augmentcode bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

dranikpg Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

augmentcode bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

dranikpg Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

dranikpg Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

augmentcode bot commented Jan 29, 2026 •

edited

Loading