Skip to content

Conversation

@dranikpg
Copy link
Contributor

Relevant to #5752

  1. Adds small IndexBuilder to be the base for async index construction operations
  2. Use it synchronously to not have any functional changes

@dranikpg dranikpg marked this pull request as ready for review January 29, 2026 18:48
Copilot AI review requested due to automatic review settings January 29, 2026 18:48
@dranikpg
Copy link
Contributor Author

augment review

@augmentcode
Copy link

augmentcode bot commented Jan 29, 2026

🤖 Augment PR Summary

Summary: Begins refactoring search indexing to support asynchronous index builds (per #5752) while preserving current behavior by running the new builder synchronously.

Changes:

  • Add new search::IndexBuilder that scans the DB table in a worker fiber and feeds docs into ShardDocIndex.
  • Wire ShardDocIndex::Rebuild to start the builder and wait for completion before finalizing initialization.
  • Add an out-of-line ShardDocIndex destructor to allow a forward-declared IndexBuilder member, and update CMake to compile the new source.
  • Prevent duplicate inserts into the shard index by short-circuiting AddDoc when the key is already present.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.


MainLoopFb(prime_table.get(), op_args.db_cntx);

fiber_.Detach(); // Detach self to be safely deletable
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling fiber_.Detach() from inside the running fiber can drop the last reference to the active FiberInterface while it’s executing (the fb2 implementation expects the final release/resume to happen from another fiber). This looks like it can lead to UB/crash during fiber teardown.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its safe, a fiber keeps itself alive until its execution finishes. It can be detached from the very start and even from within itself

TraverseAllMatching(*base_, op_args, cb);
// This PR is limited to using the builder synchronously
while (builder_)
util::ThisFiber::SleepFor(100us);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This synchronous wait relies on on_complete always running to reset builder_; if the builder fiber exits early/fails before invoking it, Rebuild() can block indefinitely here.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there's no way for the indexer to fail, it will be handled in the future

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.


MainLoopFb(prime_table.get(), op_args.db_cntx);

fiber_.Detach(); // Detach self to be safely deletable
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling on_complete() (which currently deletes the IndexBuilder via builder_.reset()) from inside the worker fiber is a fragile self-destruction pattern: any future access to members after this point would become UB. It may be safer if completion is signaled back to the caller fiber/thread rather than deleting the builder from within itself.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree its hacky but it's simple. No context really owns the index, but to join the final fiber we have to run it in a suspendable context (read fiber). With ProactorBase::me()->Dispatch({}) to create another fiber to stop a fiber?


TraverseAllMatching(*base_, op_args, cb);
// This PR is limited to using the builder synchronously
while (builder_)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebuild() now waits in a loop until builder_ is reset by the worker fiber; if the worker fiber exits early or on_complete is skipped, this becomes an infinite wait on the shard fiber. Consider adding a guard/failure path so rebuild can’t block forever on completion.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces an IndexBuilder abstraction intended to support asynchronous index construction (per #5752), while keeping current behavior functionally synchronous.

Changes:

  • Added search::IndexBuilder (fiber-based) to traverse the shard table and add matching documents to an index, yielding periodically.
  • Wired ShardDocIndex::Rebuild() to use IndexBuilder and wait for completion (sync usage in this PR).
  • Added plumbing for ownership/forward-declarations (ShardDocIndex destructor + builder_ member) and updated search CMake target.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/server/search/index_builder.h Declares new IndexBuilder helper and its fiber-based entrypoints.
src/server/search/index_builder.cc Implements traversal loop with periodic yields and completion callback.
src/server/search/doc_index.cc Switches Rebuild() to use IndexBuilder and adds a synchronous wait loop.
src/server/search/doc_index.h Adds forward declaration + builder_ member and explicit ctor/dtor.
src/server/search/doc_index_fallback.cc Adds ShardDocIndex dtor in non-search build.
src/server/search/CMakeLists.txt Adds index_builder.cc to dfly_search_server when WITH_SEARCH is enabled.

Comment on lines +39 to +40
ShardDocIndex::~ShardDocIndex() {
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShardDocIndex now owns std::unique_ptr<search::IndexBuilder> builder_, but this fallback TU defines ShardDocIndex::~ShardDocIndex() without including the full IndexBuilder definition. That will fail to compile because std::unique_ptr<T> requires T to be complete when the containing class destructor is instantiated. Include server/search/index_builder.h here, or conditionally exclude builder_ from ShardDocIndex when WITH_SEARCH is off.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings January 29, 2026 21:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Comment on lines 27 to 43
void IndexBuilder::MainLoopFb(dfly::DbTable* table, DbContext db_cntx) {
const auto doc_index = index_->GetInfo().base_index;

auto cb = [this, doc_index, db_cntx, scratch = std::string{}](PrimeTable::iterator it) mutable {
PrimeValue& pv = it->second;
std::string_view key = it->first.GetSlice(&scratch);

if (doc_index.Matches(key, pv.ObjType()))
index_->AddDoc(key, db_cntx, pv);
};

PrimeTable::Cursor cursor;
do {
cursor = table->prime.Traverse(cursor, cb);
if (base::CycleClock::ToUsec(util::ThisFiber::GetRunningTimeCycles()) > 500)
util::ThisFiber::Yield();
} while (cursor);
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexBuilder::MainLoopFb duplicates the traversal/yielding logic that already exists as TraverseAllMatching in doc_index.cc. Having two implementations increases the risk they diverge (e.g., yield thresholds, match conditions). Consider factoring the traversal into a shared helper and reuse it here, keeping IndexBuilder focused on async orchestration.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the goal is to replace it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant