Skip to content

Conversation

@Gladdonilli
Copy link
Contributor

@Gladdonilli Gladdonilli commented Jan 11, 2026

Summary

This PR improves background agent reliability by adding safety guards and simplifying completion detection to prevent stuck tasks.

Changes

1. Agent Safety Guards

  • max_steps limit: Added max_steps: 25 to explore agent and max_steps: 30 to librarian agent to prevent infinite loops
  • Tool blocking: Librarian now blocks sisyphus_task, call_omo_agent, and task tools to prevent unintended child spawning

2. Background Agent Completion Detection (PR #655 Implementation)

  • Global timeout: Added MAX_RUN_TIME_MS (15 minutes) to prevent tasks from running forever
  • Simplified idle handler: Removed validateSessionHasOutput() and checkSessionTodos() guards that were causing stuck tasks
  • Minimum idle time: MIN_IDLE_TIME_MS (5 seconds) prevents premature completion from early session.idle events
  • Timeout cleanup: Timer is properly cleared on task completion to prevent memory leaks

3. Configuration Improvements

  • JSONC support: Config paths now check for .jsonc files before .json, enabling comments in config files
  • Category model fix: sisyphus_task sync mode now correctly passes categoryModel for category-based tasks

Completion Detection Flow

┌─────────────────────────────────────────────────────────────┐
│                    COMPLETION FLOW                          │
├─────────────────────────────────────────────────────────────┤
│  1. Agent runs, does work (tool calls, thinking, etc.)      │
│  2. Agent goes idle (no more tool calls/responses)          │
│  3. OpenCode SDK fires session.idle event                   │
│  4. GUARD: Check if elapsed time >= MIN_IDLE_TIME_MS (5s)   │
│     - If < 5s: IGNORE (too early, agent still starting)     │
│     - If >= 5s: ACCEPT as complete                          │
│  5. Clear timeout timer (prevent memory leak)               │
│  6. Mark task.status = "completed"                          │
│  7. Notify parent via noReply batching pattern              │
└─────────────────────────────────────────────────────────────┘

Why Guards Were Removed

The previous guards (validateSessionHasOutput, checkSessionTodos) caused tasks to get stuck:

Guard Problem
validateSessionHasOutput If model returns empty (config issue), waits forever
checkSessionTodos If agent creates todos but never completes them, waits forever

The new approach: Fail fast, surface errors, don't hide them. An empty result is a visible signal of a problem the user can debug.

Safety Nets

Edge Case Protection
Early completion MIN_IDLE_TIME_MS (5s minimum)
Infinite loop max_steps: 25/30
Total hang MAX_RUN_TIME_MS (15 min timeout)
Bad model config Empty result surfaces issue
Unintended spawning Tool blocking in librarian
Memory leak Timeout timer cleared on completion

Testing

  • bun run typecheck passes
  • bun run build succeeds
  • ✅ All 45 background-agent tests pass
  • ✅ Manual testing: explore (background), librarian (background), explore (sync), librarian (sync) all complete successfully

References


Summary by cubic

Improves background agent stability and completion detection to prevent stuck tasks and runaway sessions. Adds safety guards, a 15‑minute timeout, simpler idle-based completion, and JSONC config support.

  • Bug Fixes

    • Completion: mark complete on session.idle after 5s min idle; add 15‑minute global timeout (including on resume) and clear/unref the timer on completion; remove output/todo guards that caused stalls; prevent double-release of concurrency keys; reset startedAt on resume to avoid premature completion.
    • Agent safety: set max_steps (explore 25, librarian 30) and block spawning tools in librarian (task, sisyphus_task, call_omo_agent).
    • Correctly pass categoryModel for category-based sisyphus_task in sync mode.
  • New Features

    • Config paths now prefer .jsonc over .json to enable comments in config files.

Written for commit 4833ebb. Summary will update on new commits.

- Add max_steps limit to explore (25) and librarian (30) agents
- Block sisyphus_task/call_omo_agent tools in librarian to prevent spawning
- Add global 15-minute timeout for background tasks (MAX_RUN_TIME_MS)
- Simplify session.idle handler - remove validateSessionHasOutput/checkSessionTodos guards
- Add JSONC config file support (.jsonc checked before .json)
- Fix categoryModel passing in sisyphus_task sync mode

Reference: PR code-yeongyu#655, kdcokenny/opencode-background-agents
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 6 files

Confidence score: 2/5

  • Double-releasing the concurrency key in src/features/background-agent/manager.ts risks oversubscribing background tasks and breaking the intended queue limits, making the change high risk.
  • Resumed tasks failing to reset startedAt in src/features/background-agent/manager.ts means long-running work can be wrongly marked complete on idle, potentially cutting off user sessions.
  • Global task timeouts in src/features/background-agent/manager.ts are left pending after completion, keeping long-lived timers around and threatening overall process stability.
  • Pay close attention to src/features/background-agent/manager.ts - concurrency handling, resume logic, and timeout cleanup all need fixes.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/background-agent/manager.ts">

<violation number="1" location="src/features/background-agent/manager.ts:189">
P2: Global task timeout is created but never cleared/unref’d on most completion paths, leaving 15‑minute timers alive after tasks finish and potentially keeping the event loop running.</violation>

<violation number="2" location="src/features/background-agent/manager.ts:197">
P1: Concurrency key is released twice on timeout (timeout handler and cleanup), but `release` is not idempotent—double-release can oversubscribe queued tasks</violation>

<violation number="3" location="src/features/background-agent/manager.ts:423">
P2: Resumed tasks can be prematurely completed: startedAt isn’t reset on resume, so the simplified session.idle handler will immediately complete any long-lived resumed task on the first idle event without verifying output/todos.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

- Reset startedAt when resuming tasks to prevent immediate completion
  (MIN_IDLE_TIME_MS check was passing immediately for resumed tasks)
- Previous commit already fixed timeout.unref() and double-release prevention
- Set concurrencyKey = undefined after every release() call to prevent
  double-release when multiple code paths try to release the same key
- Add 15-minute timeout timer for resumed tasks (was missing)
- Fixes: promptAsync error, session.deleted, pruneStaleTasksAndNotifications
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/background-agent/manager.ts">

<violation number="1" location="src/features/background-agent/manager.ts:361">
P2: Resume error path does not clear the newly created timeout or release the concurrency key, holding resources until retention cleanup and keeping the timer scheduled</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Address P2 feedback: resume() error handler now properly:
- Clears the timeout timer created for the resumed task
- Releases concurrency key to unblock queued tasks
@motonari728
Copy link

Love this PR ❤️ I hope early merge.

@code-yeongyu
Copy link
Owner

@Gladdonilli i once modified in this way before but kinda worried about double-releasing as cubit bot, do you have any opinions regarding this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants