Skip to content

refactor(mcp): async snapshot lock and workspace-scoped state#274

Draft
defin85 wants to merge 2 commits intozilliztech:masterfrom
defin85:feature/mcp-snapshot-async-workspace
Draft

refactor(mcp): async snapshot lock and workspace-scoped state#274
defin85 wants to merge 2 commits intozilliztech:masterfrom
defin85:feature/mcp-snapshot-async-workspace

Conversation

@defin85
Copy link

@defin85 defin85 commented Feb 20, 2026

Supersedes #272 to align with branch naming convention in CONTRIBUTING.md (feature/...).

Problem

Snapshot persistence used sync busy-wait lock backoff, causing high CPU under lock contention. Snapshot state was global, so multiple workspaces/sessions contended on one lock file.

Changes

  • Replaced busy-spin lock retry with async retry + jitter.
  • Added lock wait/retry telemetry logs.
  • Added serialized snapshot save queue and scheduled/coalesced saves.
  • Switched default snapshot scope to workspace-specific file (hash of process.cwd()).
  • Added best-effort migration from legacy global snapshot into workspace snapshot.
  • Updated MCP handlers to await snapshot saves.
  • Reduced indexing progress write frequency via coalesced scheduling.

Breaking Changes

  • SnapshotManager.saveCodebaseSnapshot() is now async and must be awaited.
  • Default snapshot scope changed from global to workspace.

Validation

  • packages/mcp/node_modules/.bin/tsc -p packages/mcp/tsconfig.json --noEmit
  • pnpm --filter @zilliz/claude-context-mcp build
  • bash scripts/build-local-mcp.sh
  • pnpm build

Additional Fixes (this update)

  • recover stale interrupted indexing snapshot entries so they do not block future index_codebase runs
  • allow force=true reindex after MCP restart by checking in-process indexing state
  • refresh snapshot metadata after incremental reindexByChange updates so get_indexing_status shows current Last updated

Verification for additional fixes

  • pnpm --filter @zilliz/claude-context-mcp typecheck
  • pnpm --filter @zilliz/claude-context-mcp build
  • manual repro: stale indexing no longer blocks force reindex
  • manual repro: incremental sync updates index and advances status timestamp

@defin85
Copy link
Author

defin85 commented Feb 20, 2026

Heads-up for reviewers: this PR contains intentional breaking changes.

Why this change

This refactor addresses high CPU usage and lock contention in MCP snapshot persistence:

  • busy-spin lock backoff was replaced with async retry+jitter;
  • snapshot state is now workspace-scoped by default, avoiding cross-repo lock contention.

Breaking changes

  • SnapshotManager.saveCodebaseSnapshot() is now async (Promise<void>) and must be awaited.
  • Default snapshot scope changed from global to workspace (~/.context/mcp/<workspaceHash>/...).

Compatibility / migration

  • Legacy global snapshot is migrated on startup in best-effort mode, but only for entries belonging to the current workspace.
  • If old global behavior is required, set:
    • MCP_SNAPSHOT_SCOPE=global

Main touched areas

  • packages/mcp/src/snapshot.ts
  • packages/mcp/src/handlers.ts

Validation performed

  • packages/mcp/node_modules/.bin/tsc -p packages/mcp/tsconfig.json --noEmit
  • pnpm --filter @zilliz/claude-context-mcp build
  • bash scripts/build-local-mcp.sh
  • pnpm build

Review focus is especially welcome on:

  • lock correctness under contention;
  • workspace migration behavior;
  • async save call-sites and ordering guarantees.

@zc277584121
Copy link
Collaborator

Hi @defin85, thanks for this thorough refactoring work — the async lock with jitter, coalesced saves, and workspace-scoped snapshot are all solid improvements.

We've been tackling some related snapshot consistency issues recently (PR #283, v0.1.6), specifically around the read-merge-write pattern re-adding removed entries and the lack of VectorDB fallback when the snapshot loses track. Your workspace-scoped approach would actually complement those fixes nicely by reducing cross-session contention at the source.

A few thoughts:

  • The workspace-scoped snapshot is an interesting direction — it avoids the multi-process race condition entirely by giving each workspace its own state file. We'd want to think carefully about the migration path and how it interacts with the cloud sync logic.
  • The coalesced/queued saves should help with the high-frequency write storms during indexing progress updates.

We'll review this more carefully when we have bandwidth for the larger refactor. Thanks for the contribution — really appreciate the detailed PR description and the thought put into the design! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants