-
Notifications
You must be signed in to change notification settings - Fork 993
fix: Improve background agent stability and completion detection #697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
fix: Improve background agent stability and completion detection #697
Conversation
- Add max_steps limit to explore (25) and librarian (30) agents - Block sisyphus_task/call_omo_agent tools in librarian to prevent spawning - Add global 15-minute timeout for background tasks (MAX_RUN_TIME_MS) - Simplify session.idle handler - remove validateSessionHasOutput/checkSessionTodos guards - Add JSONC config file support (.jsonc checked before .json) - Fix categoryModel passing in sisyphus_task sync mode Reference: PR code-yeongyu#655, kdcokenny/opencode-background-agents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 issues found across 6 files
Confidence score: 2/5
- Double-releasing the concurrency key in
src/features/background-agent/manager.tsrisks oversubscribing background tasks and breaking the intended queue limits, making the change high risk. - Resumed tasks failing to reset
startedAtinsrc/features/background-agent/manager.tsmeans long-running work can be wrongly marked complete on idle, potentially cutting off user sessions. - Global task timeouts in
src/features/background-agent/manager.tsare left pending after completion, keeping long-lived timers around and threatening overall process stability. - Pay close attention to
src/features/background-agent/manager.ts- concurrency handling, resume logic, and timeout cleanup all need fixes.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/features/background-agent/manager.ts">
<violation number="1" location="src/features/background-agent/manager.ts:189">
P2: Global task timeout is created but never cleared/unref’d on most completion paths, leaving 15‑minute timers alive after tasks finish and potentially keeping the event loop running.</violation>
<violation number="2" location="src/features/background-agent/manager.ts:197">
P1: Concurrency key is released twice on timeout (timeout handler and cleanup), but `release` is not idempotent—double-release can oversubscribe queued tasks</violation>
<violation number="3" location="src/features/background-agent/manager.ts:423">
P2: Resumed tasks can be prematurely completed: startedAt isn’t reset on resume, so the simplified session.idle handler will immediately complete any long-lived resumed task on the first idle event without verifying output/todos.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
- Reset startedAt when resuming tasks to prevent immediate completion (MIN_IDLE_TIME_MS check was passing immediately for resumed tasks) - Previous commit already fixed timeout.unref() and double-release prevention
- Set concurrencyKey = undefined after every release() call to prevent double-release when multiple code paths try to release the same key - Add 15-minute timeout timer for resumed tasks (was missing) - Fixes: promptAsync error, session.deleted, pruneStaleTasksAndNotifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/features/background-agent/manager.ts">
<violation number="1" location="src/features/background-agent/manager.ts:361">
P2: Resume error path does not clear the newly created timeout or release the concurrency key, holding resources until retention cleanup and keeping the timer scheduled</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Address P2 feedback: resume() error handler now properly: - Clears the timeout timer created for the resumed task - Releases concurrency key to unblock queued tasks
|
Love this PR ❤️ I hope early merge. |
|
@Gladdonilli i once modified in this way before but kinda worried about double-releasing as cubit bot, do you have any opinions regarding this |
Summary
This PR improves background agent reliability by adding safety guards and simplifying completion detection to prevent stuck tasks.
Changes
1. Agent Safety Guards
max_stepslimit: Addedmax_steps: 25to explore agent andmax_steps: 30to librarian agent to prevent infinite loopssisyphus_task,call_omo_agent, andtasktools to prevent unintended child spawning2. Background Agent Completion Detection (PR #655 Implementation)
MAX_RUN_TIME_MS(15 minutes) to prevent tasks from running forevervalidateSessionHasOutput()andcheckSessionTodos()guards that were causing stuck tasksMIN_IDLE_TIME_MS(5 seconds) prevents premature completion from earlysession.idleevents3. Configuration Improvements
.jsoncfiles before.json, enabling comments in config filessisyphus_tasksync mode now correctly passescategoryModelfor category-based tasksCompletion Detection Flow
Why Guards Were Removed
The previous guards (
validateSessionHasOutput,checkSessionTodos) caused tasks to get stuck:validateSessionHasOutputcheckSessionTodosThe new approach: Fail fast, surface errors, don't hide them. An empty result is a visible signal of a problem the user can debug.
Safety Nets
MIN_IDLE_TIME_MS(5s minimum)max_steps: 25/30MAX_RUN_TIME_MS(15 min timeout)Testing
bun run typecheckpassesbun run buildsucceedsReferences
Summary by cubic
Improves background agent stability and completion detection to prevent stuck tasks and runaway sessions. Adds safety guards, a 15‑minute timeout, simpler idle-based completion, and JSONC config support.
Bug Fixes
New Features
Written for commit 4833ebb. Summary will update on new commits.