fix: Improve background agent stability and completion detection #697

Gladdonilli · 2026-01-11T17:43:21Z

Summary

This PR improves background agent reliability by adding safety guards and simplifying completion detection to prevent stuck tasks.

Changes

1. Agent Safety Guards

max_steps limit: Added max_steps: 25 to explore agent and max_steps: 30 to librarian agent to prevent infinite loops
Tool blocking: Librarian now blocks sisyphus_task, call_omo_agent, and task tools to prevent unintended child spawning

2. Background Agent Completion Detection (PR #655 Implementation)

Global timeout: Added MAX_RUN_TIME_MS (15 minutes) to prevent tasks from running forever
Simplified idle handler: Removed validateSessionHasOutput() and checkSessionTodos() guards that were causing stuck tasks
Minimum idle time: MIN_IDLE_TIME_MS (5 seconds) prevents premature completion from early session.idle events
Timeout cleanup: Timer is properly cleared on task completion to prevent memory leaks

3. Configuration Improvements

JSONC support: Config paths now check for .jsonc files before .json, enabling comments in config files
Category model fix: sisyphus_task sync mode now correctly passes categoryModel for category-based tasks

Completion Detection Flow

┌─────────────────────────────────────────────────────────────┐
│                    COMPLETION FLOW                          │
├─────────────────────────────────────────────────────────────┤
│  1. Agent runs, does work (tool calls, thinking, etc.)      │
│  2. Agent goes idle (no more tool calls/responses)          │
│  3. OpenCode SDK fires session.idle event                   │
│  4. GUARD: Check if elapsed time >= MIN_IDLE_TIME_MS (5s)   │
│     - If < 5s: IGNORE (too early, agent still starting)     │
│     - If >= 5s: ACCEPT as complete                          │
│  5. Clear timeout timer (prevent memory leak)               │
│  6. Mark task.status = "completed"                          │
│  7. Notify parent via noReply batching pattern              │
└─────────────────────────────────────────────────────────────┘

Why Guards Were Removed

The previous guards (validateSessionHasOutput, checkSessionTodos) caused tasks to get stuck:

Guard	Problem
`validateSessionHasOutput`	If model returns empty (config issue), waits forever
`checkSessionTodos`	If agent creates todos but never completes them, waits forever

The new approach: Fail fast, surface errors, don't hide them. An empty result is a visible signal of a problem the user can debug.

Safety Nets

Edge Case	Protection
Early completion	`MIN_IDLE_TIME_MS` (5s minimum)
Infinite loop	`max_steps: 25/30`
Total hang	`MAX_RUN_TIME_MS` (15 min timeout)
Bad model config	Empty result surfaces issue
Unintended spawning	Tool blocking in librarian
Memory leak	Timeout timer cleared on completion

Testing

✅ bun run typecheck passes
✅ bun run build succeeds
✅ All 45 background-agent tests pass
✅ Manual testing: explore (background), librarian (background), explore (sync), librarian (sync) all complete successfully

References

Follows pattern from kdcokenny/opencode-background-agents
Implements simplified completion from PR fix(sisyphus-task): complete overhaul of background agent model handling and sync mode #655

Summary by cubic

Improves background agent stability and completion detection to prevent stuck tasks and runaway sessions. Adds safety guards, a 15‑minute timeout, simpler idle-based completion, and JSONC config support.

Bug Fixes
- Completion: mark complete on session.idle after 5s min idle; add 15‑minute global timeout (including on resume) and clear/unref the timer on completion; remove output/todo guards that caused stalls; prevent double-release of concurrency keys; reset startedAt on resume to avoid premature completion.
- Agent safety: set max_steps (explore 25, librarian 30) and block spawning tools in librarian (task, sisyphus_task, call_omo_agent).
- Correctly pass categoryModel for category-based sisyphus_task in sync mode.
New Features
- Config paths now prefer .jsonc over .json to enable comments in config files.

^{Written for commit 4833ebb. Summary will update on new commits.}

- Add max_steps limit to explore (25) and librarian (30) agents - Block sisyphus_task/call_omo_agent tools in librarian to prevent spawning - Add global 15-minute timeout for background tasks (MAX_RUN_TIME_MS) - Simplify session.idle handler - remove validateSessionHasOutput/checkSessionTodos guards - Add JSONC config file support (.jsonc checked before .json) - Fix categoryModel passing in sisyphus_task sync mode Reference: PR code-yeongyu#655, kdcokenny/opencode-background-agents

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

cubic-dev-ai

3 issues found across 6 files

Confidence score: 2/5

Double-releasing the concurrency key in src/features/background-agent/manager.ts risks oversubscribing background tasks and breaking the intended queue limits, making the change high risk.
Resumed tasks failing to reset startedAt in src/features/background-agent/manager.ts means long-running work can be wrongly marked complete on idle, potentially cutting off user sessions.
Global task timeouts in src/features/background-agent/manager.ts are left pending after completion, keeping long-lived timers around and threatening overall process stability.
Pay close attention to src/features/background-agent/manager.ts - concurrency handling, resume logic, and timeout cleanup all need fixes.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/background-agent/manager.ts">

<violation number="1" location="src/features/background-agent/manager.ts:189">
P2: Global task timeout is created but never cleared/unref’d on most completion paths, leaving 15‑minute timers alive after tasks finish and potentially keeping the event loop running.</violation>

<violation number="2" location="src/features/background-agent/manager.ts:197">
P1: Concurrency key is released twice on timeout (timeout handler and cleanup), but `release` is not idempotent—double-release can oversubscribe queued tasks</violation>

<violation number="3" location="src/features/background-agent/manager.ts:423">
P2: Resumed tasks can be prematurely completed: startedAt isn’t reset on resume, so the simplified session.idle handler will immediately complete any long-lived resumed task on the first idle event without verifying output/todos.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/features/background-agent/manager.ts

- Reset startedAt when resuming tasks to prevent immediate completion (MIN_IDLE_TIME_MS check was passing immediately for resumed tasks) - Previous commit already fixed timeout.unref() and double-release prevention

- Set concurrencyKey = undefined after every release() call to prevent double-release when multiple code paths try to release the same key - Add 15-minute timeout timer for resumed tasks (was missing) - Fixes: promptAsync error, session.deleted, pruneStaleTasksAndNotifications

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/background-agent/manager.ts">

<violation number="1" location="src/features/background-agent/manager.ts:361">
P2: Resume error path does not clear the newly created timeout or release the concurrency key, holding resources until retention cleanup and keeping the timer scheduled</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/features/background-agent/manager.ts

Address P2 feedback: resume() error handler now properly: - Clears the timeout timer created for the resumed task - Releases concurrency key to unblock queued tasks

motonari728 · 2026-01-11T22:12:21Z

Love this PR ❤️ I hope early merge.

code-yeongyu · 2026-01-12T04:44:09Z

@Gladdonilli i once modified in this way before but kinda worried about double-releasing as cubit bot, do you have any opinions regarding this

Gladdonilli added 2 commits January 12, 2026 00:20

fix: clear timeout timer on task completion to prevent memory leak

9bf9319

greptile-apps bot reviewed Jan 11, 2026

View reviewed changes

cubic-dev-ai bot reviewed Jan 11, 2026

View reviewed changes

src/features/background-agent/manager.ts Show resolved Hide resolved

src/features/background-agent/manager.ts Show resolved Hide resolved

src/features/background-agent/manager.ts Outdated Show resolved Hide resolved

Gladdonilli added 2 commits January 12, 2026 01:57

fix: address PR review issues - reset startedAt on resume

f83f143

- Reset startedAt when resuming tasks to prevent immediate completion (MIN_IDLE_TIME_MS check was passing immediately for resumed tasks) - Previous commit already fixed timeout.unref() and double-release prevention

cubic-dev-ai bot reviewed Jan 11, 2026

View reviewed changes

src/features/background-agent/manager.ts Show resolved Hide resolved

fix: clean up timeout and concurrency on resume error

4833ebb

Address P2 feedback: resume() error handler now properly: - Clears the timeout timer created for the resumed task - Releases concurrency key to unblock queued tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Improve background agent stability and completion detection #697

fix: Improve background agent stability and completion detection #697

Gladdonilli commented Jan 11, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

motonari728 commented Jan 11, 2026

Uh oh!

code-yeongyu commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Improve background agent stability and completion detection #697

Are you sure you want to change the base?

fix: Improve background agent stability and completion detection #697

Conversation

Gladdonilli commented Jan 11, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Agent Safety Guards

2. Background Agent Completion Detection (PR #655 Implementation)

3. Configuration Improvements

Completion Detection Flow

Why Guards Were Removed

Safety Nets

Testing

References

Summary by cubic

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

motonari728 commented Jan 11, 2026

Uh oh!

code-yeongyu commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Gladdonilli commented Jan 11, 2026 •

edited by cubic-dev-ai bot

Loading