Skip to content

fix: batch recovery jobs to avoid 16MB read limit#172

Closed
sethconvex wants to merge 5 commits intomainfrom
fix/recovery-batch-size
Closed

fix: batch recovery jobs to avoid 16MB read limit#172
sethconvex wants to merge 5 commits intomainfrom
fix/recovery-batch-size

Conversation

@sethconvex
Copy link
Contributor

@sethconvex sethconvex commented Mar 3, 2026

Summary

  • Recovery was sending all stale running jobs to a single recover mutation, which could exceed Convex's 16MB read limit when many jobs needed recovery simultaneously (e.g. high maxParallelism + server restart)
  • Batch recovery jobs into chunks of 50, matching the existing pattern used for cancellations (CANCELLATION_BATCH_SIZE = 64)

Test plan

  • Existing loop and recovery tests pass (31 tests)
  • Verify with a high maxParallelism deployment that recovery no longer hits the 16MB limit

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor
    • Recovery now processes jobs in discrete batches for more efficient, incremental recovery.
    • Recovery flow now correctly continues when more old jobs remain, preventing premature completion of recovery.

Recovery was sending all stale running jobs to a single `recover`
mutation, which could exceed Convex's 16MB read limit when many jobs
needed recovery at once (e.g. high maxParallelism + server restart).
Batch into chunks of 50, matching the pattern used for cancellations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Warning

Rate limit exceeded

@sethconvex has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 15 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 96380fe and 639c510.

📒 Files selected for processing (1)
  • src/component/loop.ts
📝 Walkthrough

Walkthrough

Updates recovery to process old jobs in sequential batches of 50. The recovery handler now returns whether more candidates remain; the main flow sets state.lastRecovery to 0n when additional batches exist, otherwise to the current segment.

Changes

Cohort / File(s) Summary
Recovery Batch Processing
src/component/loop.ts
Introduce RECOVERY_BATCH_SIZE = 50; change handleRecovery to process up to 50 recovery jobs per invocation and return a boolean indicating more candidates; update main to set state.lastRecovery to 0n when more batches remain, otherwise to the current segment.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 Fifty hops in tidy rows,
Old tasks find new paths where wind blows,
Quiet batches, one by one,
Recovery’s work now neatly done,
I nibble carrots and hum a fun tune. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: batch recovery jobs to avoid 16MB read limit' directly and clearly describes the main change: implementing batching for recovery jobs to prevent exceeding Convex's 16MB read limit. It is concise, specific, and accurately represents the primary objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/recovery-batch-size

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 3, 2026

Open in StackBlitz

npm i https://pkg.pr.new/get-convex/workpool/@convex-dev/workpool@172

commit: 639c510

@sethconvex sethconvex requested a review from ianmacartney March 3, 2026 17:52
sethconvex and others added 2 commits March 3, 2026 09:56
The initial batch fix only batched the scheduled `recover` call, but
`handleRecovery` inside `main` was still reading work docs for every
old running job unbounded. Now it processes at most RECOVERY_BATCH_SIZE
candidates per iteration and signals `main` to re-run recovery
immediately if more remain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
if (r.started >= oldEnoughToConsider) {
return null;
}
const work = await ctx.db.get(r.workId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still loading the old work - which may be big

sethconvex and others added 2 commits March 3, 2026 10:11
Work documents can store arbitrarily large fnArgs, so use a
conservative batch size to stay well under the 16MB read limit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sethconvex sethconvex closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants