Skip to content

fix: batch recovery to avoid 16MB read limit#174

Open
sethconvex wants to merge 8 commits intomainfrom
fix/recovery-batch-main-loop
Open

fix: batch recovery to avoid 16MB read limit#174
sethconvex wants to merge 8 commits intomainfrom
fix/recovery-batch-main-loop

Conversation

@sethconvex
Copy link
Contributor

@sethconvex sethconvex commented Mar 3, 2026

Summary

  • Recovery was sending all stale running jobs to a single recover mutation, which could exceed Convex's 16MB read limit when many jobs needed recovery simultaneously (e.g. high maxParallelism + server restart)
  • Batch recovery jobs with RECOVERY_BATCH_SIZE = 10 in both the scheduled recover call and the handleRecovery reads inside main
  • Keep the 1-minute recovery cadence — remaining old jobs are picked up in subsequent cycles rather than triggering immediate re-runs

Test plan

  • All 80 tests pass
  • Verify with a high maxParallelism deployment that recovery no longer hits the 16MB limit

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor
    • Optimized recovery processing to handle jobs in smaller batches per iteration, reducing resource consumption and improving overall performance during recovery operations.

sethconvex and others added 8 commits March 3, 2026 09:51
Recovery was sending all stale running jobs to a single `recover`
mutation, which could exceed Convex's 16MB read limit when many jobs
needed recovery at once (e.g. high maxParallelism + server restart).
Batch into chunks of 50, matching the pattern used for cancellations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The initial batch fix only batched the scheduled `recover` call, but
`handleRecovery` inside `main` was still reading work docs for every
old running job unbounded. Now it processes at most RECOVERY_BATCH_SIZE
candidates per iteration and signals `main` to re-run recovery
immediately if more remain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Work documents can store arbitrarily large fnArgs, so use a
conservative batch size to stay well under the 16MB read limit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
handleRecovery was reading work docs for every old running job
unbounded inside the main loop mutation. Now it processes at most
RECOVERY_BATCH_SIZE candidates per iteration and signals main to
re-run recovery immediately if more remain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Processing only a batch per cycle is sufficient — recovery is a
health check, not an emergency. Remaining old jobs will be picked
up in subsequent recovery cycles.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1627d4 and b90fbb6.

📒 Files selected for processing (1)
  • src/component/loop.ts

📝 Walkthrough

Walkthrough

A single file modification to src/component/loop.ts introduces a RECOVERY_BATCH_SIZE constant limiting recovery work to 10 jobs per iteration. The handleRecovery function now processes only a curated batch of old-running jobs instead of scanning all running items, reducing per-iteration processing scope.

Changes

Cohort / File(s) Summary
Batch Recovery Optimization
src/component/loop.ts
Introduces RECOVERY_BATCH_SIZE constant set to 10 and refactors handleRecovery to process only a limited batch of old-running jobs per iteration instead of all running items, with added clarifying comments.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Hopping through recovery, ten at a time,
No more endless loops in the recovery climb,
Batch by batch we bound, efficient and fleet,
Making recovery loops oh-so-sweet!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: batch recovery to avoid 16MB read limit' directly and clearly summarizes the main change: introducing batching for recovery to prevent exceeding Convex's 16MB read limit.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/recovery-batch-main-loop

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 3, 2026

Open in StackBlitz

npm i https://pkg.pr.new/get-convex/workpool/@convex-dev/workpool@174

commit: b90fbb6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant