LLM-driven autonomous agent runtime for browser automation and software engineering. Give it a task in plain English, and it uses an LLM to control a real Chrome browser, write code, run tests, and solve problems end-to-end.
remix-agent-runtime is the orchestration layer that connects remix-browser (headless Chrome via MCP) with any LLM provider to create an autonomous agent. It works with Anthropic, OpenRouter, AWS Bedrock, or any provider compatible with the Anthropic Messages API format. The runtime can operate in browser mode, terminal-only mode (--no-browser), or both simultaneously. Credentials are secured by remix-credentials.
remix-agent-runtime
┌───────────────────────────────────────────────────────────┐
│ │
│ "Log into GitHub and star the remix-browser repo" │
│ │ │
│ ▼ │
│ AGENTS.md ──► ┌────────────────┐ ◄── Credentials │
│ instructions │ Agent Loop │ (remix-credentials) │
│ └───────┬────────┘ │
│ ▲ │ │
│ results │ │ tool calls │
│ │ ▼ │
│ ┌──────────────┴──────────────────────────────────┐ │
│ │ Decorator Chain │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────┐ │ │
│ │ │ CoordinationExecutor (7 coord tools) │ │ │
│ │ └──────────────────┬──────────────────────┘ │ │
│ │ │ │ │
│ │ ┌──────────────────┴──────────────────────┐ │ │
│ │ │ PermissionAwareExecutor (4 modes) │ │ │
│ │ └──────────────────┬──────────────────────┘ │ │
│ │ │ │ │
│ │ ┌──────────────────┴──────────────────────┐ │ │
│ │ │ HookAwareExecutor (pre/post tool hooks) │ │ │
│ │ └──────────────────┬──────────────────────┘ │ │
│ │ │ │ │
│ │ ┌──────────────────┴──────────────────────┐ │ │
│ │ │ LocalToolsExecutor (7 sandboxed tools) │ │ │
│ │ └──────────────────┬──────────────────────┘ │ │
│ │ │ │ │
│ │ ┌──────────────────┴──────────────────────┐ │ │
│ │ │ SkillAwareExecutor (3 skill tools) │ │ │
│ │ └──────────────────┬──────────────────────┘ │ │
│ │ │ │ │
│ │ ┌──────────────────┴──────────────────────┐ │ │
│ │ │ CompositeToolExecutor (MCP backends) │ │ │
│ │ └────┬────────────────────────────────────┘ │ │
│ └───────┼────────────────────────────────────────┘ │
│ │ │
│ ┌─────────┘ ┌──────────────────────┐ │
│ │ │ Sandboxed filesystem │ │
│ ▼ │ (Seatbelt/Landlock) │ │
│ remix-browser └──────────────────────┘ │
│ (MCP Server) │
│ │ │
└───────┼──────────────────────────────────────────────────┘
│ CDP
▼
┌──────────────┐
│ Chrome │
└──────────────┘
- You provide a task in natural language
- The agent sends the task + available tools to the LLM
- The LLM decides which tools to call (navigate, click, type, read_file, bash, etc.)
- Tool calls pass through the decorator chain: hooks fire, local tools and skills are intercepted, everything else routes to the browser MCP backend
- Results go back to the LLM, which decides the next action
- Loop continues until the task is complete or a stopping condition is hit
- Structured JSON output with every step recorded
| Project | Role |
|---|---|
| remix-browser | Rust-native MCP server for Chrome automation -- 18+ tools for navigation, clicking, typing, screenshots, network monitoring, and more |
| remix-credentials | Secure credential management with AES-256-GCM encryption, Argon2id key derivation, and zeroizable memory |
| remix-agent-runtime (this project) | The agent loop that ties it all together -- connects an LLM to browser tools and runs autonomously |
- Google Chrome or Chromium
- An API key from a supported LLM provider (Anthropic, OpenRouter, AWS Bedrock, etc.)
One command installs both remix-agent and remix-browser -- no Rust toolchain needed:
curl -fsSL https://raw.githubusercontent.com/hkd987/remix-agent-runtime/main/scripts/install.sh | shIf you already have remix-browser installed, the script detects it and only installs the agent.
From source
Requires Rust 1.88+:
# Install remix-browser
curl -fsSL https://raw.githubusercontent.com/hkd987/remix-browser/main/scripts/install.sh | sh
# Build remix-agent from source
git clone https://github.com/hkd987/remix-agent-runtime.git
cd remix-agent-runtime && cargo build --release
cp target/release/remix-agent /usr/local/bin/Pre-built binaries are available for macOS (Apple Silicon & Intel), Linux x86_64, and Windows x86_64. See Releases for all downloads.
export REMIX_LLM_API_KEY=sk-ant-your-key-here
remix-agent run "Navigate to example.com and tell me what's on the page"remix-agent run [OPTIONS] [TASK]Core:
| Flag | Short | Env Var | Description |
|---|---|---|---|
--config <PATH> |
-c |
-- | Path to YAML configuration file |
--api-key <KEY> |
-- | REMIX_LLM_API_KEY |
LLM provider API key |
--base-url <URL> |
-- | REMIX_LLM_BASE_URL |
LLM provider base URL (default: Anthropic) |
--model <NAME> |
-- | REMIX_LLM_MODEL |
Model ID (default: claude-sonnet-4-20250514) |
--max-tokens <N> |
-- | -- | Max tokens per response (default: 8192) |
--thinking-budget-tokens <N> |
-- | -- | Thinking/reasoning budget tokens for extended thinking |
--timeout <SECS> |
-- | -- | Max duration in seconds |
--max-iterations <N> |
-- | -- | Max agent loop iterations (default: 50) |
--system-prompt <TEXT> |
-- | -- | Custom system prompt |
--effort <LEVEL> |
-- | -- | Effort level: low, medium, high, max |
--verbose |
-v |
-- | Debug logging to stderr |
--output <PATH> |
-o |
-- | Write JSON results to file |
Browser:
| Flag | Env Var | Description |
|---|---|---|
--headed |
-- | Show the browser window |
--no-browser |
-- | Terminal-only mode (no browser connection) |
--browser-path <PATH> |
REMIX_BROWSER_PATH |
Path to remix-browser binary |
Agent behavior:
| Flag | Description |
|---|---|
--nudge-on-text-only |
Nudge the LLM when it returns text without tool calls |
--nudge-max-count <N> |
Max text-only nudges before terminating (default: 3) |
--goal-check-on-complete |
Verify goal completion before terminating |
--action-reminder-interval <N> |
Inject progress reminders every N iterations |
--tool-result-max-bytes <N> |
Max bytes per tool result (default: 32768) |
--context-window <N> |
Override context window size for compaction |
--disable-compaction |
Disable automatic context compaction |
Loop detection:
| Flag | Description |
|---|---|
--loop-detection |
Enable loop detection with default settings |
--loop-detection-max-repeats <N> |
Max identical tool calls before warning (default: 3) |
--loop-detection-window <N> |
Lookback window size (default: 10) |
--loop-detection-max-failures <N> |
Max failing commands without a file write before semantic loop warning (default: 4) |
Reasoning stages:
| Flag | Description |
|---|---|
--reasoning-stages |
Enable adaptive thinking budgets across planning/execution/verification phases |
--planning-budget-tokens <N> |
Thinking tokens for planning phase (default: 10000) |
--execution-budget-tokens <N> |
Thinking tokens for execution phase (default: 5000) |
--verification-budget-tokens <N> |
Thinking tokens for verification phase (default: 10000) |
--iteration-budget-warning-threshold <F> |
Warn agent at this fraction of max iterations (e.g., 0.7) |
Discovery & plugins:
| Flag | Env Var | Description |
|---|---|---|
--agents-md-dir <PATH> |
REMIX_AGENTS_MD_DIR |
Override AGENTS.md search directory |
--no-agents-md |
-- | Disable AGENTS.md discovery |
--no-local-tools |
-- | Disable local filesystem tools |
--sandbox-dir <PATH> |
REMIX_SANDBOX_DIR |
Sandbox root for local tools |
--skills-dir <PATH> |
REMIX_SKILLS_DIR |
Additional skills directory |
--no-skills |
-- | Disable skill discovery |
--no-plugins |
-- | Disable all plugin discovery |
--plugins-dir <PATH> |
REMIX_PLUGINS_DIR |
Additional plugin directory |
--no-claude-plugins |
-- | Disable Claude Code plugin cache |
Dev tools:
| Flag | Description |
|---|---|
--no-dev-tools |
Disable all dev tools (LSP, test harness, repo map) |
--no-lsp |
Disable LSP integration |
--no-test-harness |
Disable test harness tools |
--no-repo-map |
Disable repo map tool |
--lsp-server <LANG=CMD> |
Override LSP server for a language (e.g., rust=rust-analyzer) |
Sessions:
| Flag | Env Var | Description |
|---|---|---|
--session-id <ID> |
-- | Resume an existing session |
--fork-session <ID> |
-- | Fork from an existing session |
--continue |
-- | Resume the most recent session |
--session-dir <PATH> |
REMIX_SESSION_DIR |
Override session storage directory |
Permissions:
| Flag | Description |
|---|---|
--permission-mode <MODE> |
default, accept_edits, bypass_permissions, plan |
--allow-tool <PATTERN> |
Regex pattern for auto-allowed tools (repeatable) |
--deny-tool <PATTERN> |
Regex pattern for denied tools (repeatable) |
Coordination:
| Flag | Env Var | Description |
|---|---|---|
--no-coordination |
-- | Disable multi-agent coordination |
--max-workers <N> |
-- | Maximum concurrent worker agents (default: 5) |
--coordination-dir <PATH> |
REMIX_COORDINATION_DIR |
Override coordination storage directory |
SSE streaming (requires sse feature):
| Flag | Env Var | Description |
|---|---|---|
--sse-port <PORT> |
REMIX_SSE_PORT |
Port for real-time SSE event server |
# Browser automation
remix-agent run "Take a screenshot of hacker news"
# Watch the browser work (headed mode)
remix-agent run --headed "Fill out the contact form on example.com"
# Terminal-only coding agent (no browser)
remix-agent run --no-browser "Fix the failing tests in src/utils.ts"
# Coding with loop detection and reasoning stages
remix-agent run --no-browser \
--loop-detection \
--reasoning-stages \
--max-iterations 100 \
"Refactor the database module to use connection pooling"
# Use a specific model
remix-agent run --model claude-opus-4-20250805 "Complex multi-step task here"
# Save structured output
remix-agent run --output results.json "Find the price of item X on site Y"
# Full config file
remix-agent run --config task.yaml --verbose
# Resume where you left off
remix-agent run --continue "keep going"
# With a local plugin
remix-agent run --plugins-dir ./my-plugin "Run my custom workflow"The runtime works with any provider that exposes an Anthropic Messages API-compatible endpoint. Just change the --base-url and --model:
# Anthropic (default)
remix-agent run --api-key sk-ant-xxx "Your task"
# OpenRouter
remix-agent run \
--base-url https://openrouter.ai/api \
--api-key sk-or-xxx \
--model anthropic/claude-sonnet-4 \
"Your task"
# AWS Bedrock (via proxy)
remix-agent run \
--base-url https://your-bedrock-proxy.com \
--api-key your-key \
--model anthropic.claude-sonnet-4-20250514-v1:0 \
"Your task"
# Any compatible provider
remix-agent run \
--base-url https://your-provider.com \
--model your-model-id \
--api-key your-key \
"Your task"Custom headers can be added via the YAML config for providers that need them:
llm:
base_url: "https://your-provider.com"
api_key: "your-key"
model: "your-model"
custom_headers:
X-Provider-Key: "value"
HTTP-Referer: "https://your-app.com"CLI flags override environment variables, which override the YAML config, which overrides defaults.
task: "Log into the dashboard and export the monthly report"
llm:
api_key: "${ANTHROPIC_API_KEY}"
model: "claude-sonnet-4-20250514"
max_tokens: 8192
thinking_budget_tokens: 10000 # Extended thinking budget
enable_prompt_caching: true # Cache system prompts (default: true)
custom_headers: # Provider-specific headers
HTTP-Referer: "https://your-app.com"
agent:
max_iterations: 50
timeout_secs: 300
system_prompt: |
You are an expert automation agent.
Complete the task efficiently and report what you find.
tool_result_max_bytes: 32768 # Truncate large tool outputs
nudge_on_text_only: false # Nudge LLM when it returns only text
nudge_max_count: 3 # Max nudges before terminating
goal_check_on_complete: false # Verify goal before terminating
action_reminder_interval: 15 # Inject progress reminder every N iterations
iteration_budget_warning_threshold: 0.7 # Warn at 70% of max iterations
loop_detection:
max_repeats: 3 # Identical tool calls before warning
window_size: 10 # Only check last N steps
max_failures_without_write: 4 # Semantic loop: failing without editing
reasoning_stages:
planning_budget_tokens: 10000 # Deep thinking for planning (0-30% of iterations)
execution_budget_tokens: 5000 # Fast execution (30-80%)
verification_budget_tokens: 10000 # Careful verification (80-100%)
planning_threshold: 0.3
verification_threshold: 0.8
browser:
enabled: true # Set false for terminal-only mode
headless: true
viewport_width: 1280
viewport_height: 720
credentials:
- name: "dashboard_login"
credential_type: username_password
username: "${DASHBOARD_USER}"
password: "${DASHBOARD_PASS}"
url_pattern: "*.internal.company.com"
agents_md:
enabled: true
search_dir: "/path/to/project"
max_size_bytes: 32768
local_tools:
enabled: true
sandbox_dir: "/path/to/sandbox"
bash_timeout_secs: 120
read_max_bytes: 1048576 # 1MB
write_max_bytes: 10485760 # 10MB
web_fetch_timeout_secs: 30
web_fetch_max_bytes: 102400 # 100KB
skills:
dirs:
- "/path/to/skills"
enabled: true
script_timeout_secs: 60
plugins:
enabled: true
claude_code_cache: true
hook_timeout_secs: 30
sources:
- path: "/path/to/local-plugin"
- github: "owner/repo"
git_ref: "v1.0"
components:
skills: true
mcp_servers: true
hooks: true
agents: true
session:
enabled: true
storage_dir: "~/.remix/sessions"
max_sessions: 100
compaction:
enabled: true
trigger_threshold: 0.95
context_window_tokens: 200000
preserve_recent_n: 4
permissions:
mode: default # default | accept_edits | bypass_permissions | plan
allowed_tools:
- "navigate|click|screenshot"
denied_tools:
- "bash"
coordination:
enabled: true
max_workers: 5
max_worker_iterations: 10
worker_timeout_secs: 120
storage_dir: "~/.remix/coordination"
dev_tools:
enabled: true
lsp:
enabled: true
request_timeout_secs: 30
server_overrides: # Override LSP server per language
rust: "rust-analyzer"
test_harness:
enabled: true
timeout_secs: 300
repo_map:
enabled: true
max_files: 5000
max_depth: 10
on_complete:
url: "https://hooks.slack.com/your-webhook"
format: "json"
on_error:
url: "https://hooks.slack.com/your-error-webhook"
format: "json"Environment variables can be interpolated in YAML using ${VAR_NAME} syntax.
Credentials are securely managed via remix-credentials -- values use zeroizable memory and are redacted from logs.
credentials:
# Username/password login
- name: "site_login"
credential_type: username_password
username: "admin"
password: "secret"
url_pattern: "*.example.com"
# API key
- name: "api_auth"
credential_type: api_key
fields:
api_key: "sk-xxxxx"
# Custom fields
- name: "oauth_creds"
credential_type: custom
fields:
client_id: "id123"
client_secret: "secret456"
tenant: "acme"Supported credential types: username_password, api_key, token, cookie, custom.
The agent supports the AGENTS.md standard for project-level instructions. When enabled, the agent walks from the search directory (or current working directory) up to the filesystem root, collecting all AGENTS.md files it finds.
- Files are ordered root-to-leaf (general instructions first, project-specific last)
- Concatenated content is capped at 32KB by default (
max_size_bytes) - Injected into the system prompt wrapped in
<project_instructions>tags - Override the search directory with
--agents-md-dirorREMIX_AGENTS_MD_DIR - Disable with
--no-agents-md
When enabled, the agent has access to seven sandboxed filesystem and shell tools:
| Tool | Description |
|---|---|
read_file |
Read file contents with optional offset/limit |
write_file |
Write content to a file (creates parent dirs) |
edit_file |
Find-and-replace exact string in a file |
bash |
Execute a shell command in the sandbox |
grep |
Regex search across files with context |
glob |
Find files matching a glob pattern |
web_fetch |
Fetch a URL and return content as markdown |
All file operations are restricted to the sandbox directory. Use --sandbox-dir or REMIX_SANDBOX_DIR to set the root. Disable with --no-local-tools.
Local tools are sandboxed at the OS level:
- macOS: Seatbelt profiles restrict file access and network to the sandbox directory
- Linux: Landlock LSM restricts filesystem access (with fallback for older kernels)
- Path validation: All file tool paths are resolved and checked against the sandbox root
- Timeouts: Bash commands are killed after the configured timeout (default: 120s)
Skills follow the AgentSkills.io standard. They provide reusable instructions and scripts the agent can load on demand.
Discovery searches these directories in order:
./skills/(project-local)~/.remix/skills/(user-global)--skills-dirCLI flag orREMIX_SKILLS_DIRenv var- YAML
skills.dirsentries - Skills contributed by plugins
Three virtual tools are added when skills are discovered:
load_skill-- Load a skill's instructions into contextrun_skill_script-- Execute a script from a skill'sscripts/directoryread_skill_resource-- Read a file from a skill's directory
Disable with --no-skills.
The plugin system extends the agent with additional skills, MCP servers, hooks, and agents from external sources. Plugins are discovered from three sources:
- Claude Code cache (
~/.claude/plugins/installed_plugins.json) -- automatically discovers plugins installed by Claude Code - Local directories -- point to a plugin directory on disk via config or
--plugins-dir - GitHub repositories -- clone and cache a plugin repo via config
A plugin is a directory containing any combination of:
my-plugin/
├── skills/ # Skill definitions (merged into SkillSet)
│ └── my-skill/
│ └── SKILL.md
├── hooks/ # Pre/post tool-use hooks
│ └── hooks.json
├── agents/ # Agent definitions (injected into system prompt)
│ └── researcher.md
└── .mcp.json # MCP server configuration
Each component type can be individually enabled or disabled via plugins.components in the YAML config.
Hooks fire shell commands before and/or after tool calls. They receive JSON context via stdin containing the tool name, input arguments, and (for post-hooks) the tool output. Hook failures are logged and never block the agent loop.
{
"hooks": {
"PreToolUse": [
{
"matcher": "navigate|click",
"hooks": [{ "type": "command", "command": "echo pre-hook ran" }]
}
],
"PostToolUse": [
{
"matcher": "screenshot",
"hooks": [{ "type": "command", "command": "./process-screenshot.sh" }]
}
]
}
}Matchers use regex patterns (pipe-separated alternatives, anchored to full tool name).
Agent definitions are markdown files with YAML frontmatter. Discovered agents are injected into the system prompt so the LLM knows they are available:
---
name: researcher
description: Searches the web for information
model: claude-sonnet-4-20250514
tools:
- web_search
- read_file
---
# Researcher Agent
You are a research specialist...| Flag | Env Var | Description |
|---|---|---|
--no-plugins |
-- | Disable all plugin discovery |
--plugins-dir <PATH> |
REMIX_PLUGINS_DIR |
Additional plugin directory |
--no-claude-plugins |
-- | Disable Claude Code plugin cache discovery |
The agent can spawn and coordinate multiple child agents to work on tasks in parallel. A lead agent breaks work into subtasks, assigns them to workers, and collects results -- all through seven virtual tools:
| Tool | Description |
|---|---|
task_create |
Create a new task with subject, description, and metadata |
task_list |
List all tasks with their status and ownership |
task_get |
Get full details for a specific task |
task_update |
Update task status, subject, description, or dependencies |
team_create |
Create a named team of agents |
send_message |
Send a message to another agent's inbox |
spawn_agent |
Spawn a new worker agent with a name, task, and optional tool filter |
Workflow: The lead agent creates a team, creates tasks, spawns workers to claim and execute them, communicates via send_message, and workers mark tasks complete when done. Workers check their inbox between loop iterations and receive messages as injected context.
All coordination state (tasks, teams, inboxes) is persisted to disk with atomic writes for crash safety.
coordination:
enabled: true
max_workers: 5
max_worker_iterations: 10
worker_timeout_secs: 120
storage_dir: ~/.remix/coordinationDisable with --no-coordination. Override worker limits with --max-workers and storage location with --coordination-dir.
Sessions persist the full conversation history so you can resume or fork previous runs.
Each session is stored at ~/.remix/sessions/{session_id}/ containing:
metadata.json-- session ID, status, timestamps, task descriptionmessages.jsonl-- append-only log of all LLM messagessteps.json-- structured record of every tool call and result
| Action | CLI |
|---|---|
| Resume a session | remix-agent run --session-id <ID> "continue the task" |
| Fork from a session | remix-agent run --fork-session <ID> "try a different approach" |
| Custom storage dir | remix-agent run --session-dir /path/to/sessions |
session:
enabled: true
storage_dir: ~/.remix/sessions
max_sessions: 100Permissions control which tools the agent can call without user confirmation.
| Mode | Description |
|---|---|
default |
Ask the user before each tool call |
accept_edits |
Auto-allow write tools (write_file, edit_file, bash), ask for others |
bypass_permissions |
Allow all tools without asking |
plan |
Read-only mode -- only allows read_file, grep, glob, load_skill, read_skill_resource |
Policy evaluation order: bypass_permissions > plan mode > denied_tools (regex) > allowed_tools (regex) > ask user.
# Run in plan mode (read-only exploration)
remix-agent run --permission-mode plan "Analyze the codebase structure"
# Auto-allow specific tools
remix-agent run --allow-tool "navigate|click|screenshot" "Take screenshots of each page"
# Deny dangerous tools
remix-agent run --deny-tool "bash|write_file" "Read and summarize the logs"permissions:
mode: default
allowed_tools:
- "navigate|click|screenshot"
denied_tools:
- "bash"When the conversation approaches the model's context window limit, the agent automatically compacts older messages into a summary. This allows long-running tasks to continue without hitting token limits.
- Trigger: When
total_input_tokens >= trigger_threshold * context_window_tokens - Process: Older messages are summarized by the LLM and replaced with a compact
<summary>block - Preservation: The most recent N messages are always kept intact
compaction:
enabled: true
trigger_threshold: 0.95
context_window_tokens: 200000
preserve_recent_n: 4The agent includes two layers of loop detection to prevent wasted iterations:
Exact-match detection catches the agent calling the same tool with identical arguments repeatedly. It hashes (tool_name, canonical_json(input)) and counts repeats within a sliding window.
Semantic loop detection catches a subtler pattern: the agent running failing commands without ever modifying its code. This detects "test-without-edit" loops where the agent varies command syntax (e.g., python test.py vs python -m pytest test.py) to avoid exact-match detection while making no progress.
agent:
loop_detection:
max_repeats: 3 # Exact-match threshold
window_size: 10 # Lookback window
max_failures_without_write: 4 # Semantic loop thresholdWhen triggered, a warning is injected into the conversation telling the agent to stop re-testing and edit its code instead.
Reasoning stages dynamically adjust the LLM's thinking budget based on where the agent is in its iteration lifecycle:
| Phase | Iterations | Default budget | Purpose |
|---|---|---|---|
| Planning | 0-30% | 10,000 tokens | Deep analysis, read code, form strategy |
| Execution | 30-80% | 5,000 tokens | Fast tool calls, write code |
| Verification | 80-100% | 10,000 tokens | Careful review, run tests, fix edge cases |
agent:
reasoning_stages:
planning_budget_tokens: 10000
execution_budget_tokens: 5000
verification_budget_tokens: 10000
planning_threshold: 0.3
verification_threshold: 0.8Enable with --reasoning-stages on the CLI. Combine with --iteration-budget-warning-threshold 0.7 to inject a one-time warning when the agent has used 70% of its iterations.
The agent can integrate with language servers and test frameworks for code intelligence.
| Tool | Description |
|---|---|
| LSP | Type checking, go-to-definition, find-references via language servers (rust-analyzer, typescript-language-server, pyright, etc.) |
| Test harness | Run tests with framework auto-detection (cargo test, pytest, jest, go test, etc.) and structured result collection |
| Repo map | Generate a codebase structure overview using tree-sitter parsing |
dev_tools:
enabled: true
lsp:
enabled: true
request_timeout_secs: 30
server_overrides:
rust: "rust-analyzer"
test_harness:
enabled: true
timeout_secs: 300
repo_map:
enabled: true
max_files: 5000
max_depth: 10Disable individually with --no-lsp, --no-test-harness, --no-repo-map, or all at once with --no-dev-tools. The repo map requires the dev-tools Cargo feature for tree-sitter support.
Get notified when tasks complete or fail:
on_complete:
url: "https://your-server.com/task-done"
format: "json"
on_error:
url: "https://your-server.com/task-failed"
format: "json"With the sse Cargo feature enabled, the agent can stream real-time events over Server-Sent Events for UI integration:
remix-agent run --sse-port 3000 "Your task here"Endpoints:
GET /events-- SSE stream of agent events (tool calls, results, completions)GET /health-- Liveness check
The agent produces structured JSON output with a full record of every step:
{
"status": "success",
"result": "Found the login button and signed in successfully",
"total_iterations": 3,
"total_duration_ms": 8420,
"total_input_tokens": 45230,
"total_output_tokens": 3120,
"total_cost_usd": 0.0523,
"steps": [
{
"iteration": 1,
"tool": "navigate",
"input": { "url": "https://example.com" },
"output": { "title": "Example" },
"duration_ms": 3200
},
{
"iteration": 2,
"tool": "click",
"input": { "selector": "#login-btn" },
"output": { "success": true },
"duration_ms": 1890
}
]
}The agent has access to all tools exposed by remix-browser:
| Category | Tools |
|---|---|
| Navigation | navigate, go_back, go_forward, reload, get_page_info |
| DOM | find_elements, get_text, get_html, wait_for |
| Interaction | click, type_text, hover, select_option, press_key, scroll |
| Screenshots | screenshot (viewport, full page, or element) |
| JavaScript | execute_js, read_console |
| Network | network_enable, get_network_log |
| Tabs | new_tab, close_tab, list_tabs |
Elements can be targeted with CSS selectors, text content, or XPath expressions.
remix-agent-runtime includes a Harbor adapter for running standardized coding benchmarks. The adapter is in harbor-adapter/ and registers as a custom Harbor agent.
# Run against Terminal-Bench 2.0 (89 coding tasks)
harbor run \
-d "[email protected]" \
--agent-import-path "remix_agent_harbor:RemixAgent" \
-m anthropic/claude-sonnet-4-6 \
--ae "REMIX_LLM_API_KEY=$REMIX_LLM_API_KEY" \
-o benchmark-results
# Run a subset of tasks
harbor run \
-d "[email protected]" \
-t "specific-task-name" \
-l 10 \
--agent-import-path "remix_agent_harbor:RemixAgent" \
-m anthropic/claude-sonnet-4-6The runtime ships with three optional feature flags:
| Feature | Dependencies | Description |
|---|---|---|
postgres |
sqlx | PostgreSQL session storage backend |
sse |
axum | Real-time SSE event streaming server |
dev-tools |
tree-sitter, tree-sitter-{rust,typescript,python,javascript} | Code intelligence via tree-sitter parsing |
# Build with all features
cargo build --release --features postgres,sse,dev-tools
# Build with just SSE
cargo build --release --features sse# Build
cargo build --release
# Run all tests (sequential to avoid env var conflicts)
cargo test -- --test-threads=1
# Lint
cargo clippy -- -D warnings
# Format check
cargo fmt --checkThe runtime uses a decorator chain pattern where each layer intercepts tool calls it owns and delegates everything else to the next layer:
CoordinationExecutor ← multi-agent coordination (7 tools)
└─ PermissionAwareExecutor ← permission checking (4 modes)
└─ HookAwareExecutor ← fires pre/post hooks around every tool call
└─ LocalToolsExecutor ← intercepts read_file, write_file, edit_file, bash, grep, glob, web_fetch
└─ SkillAwareExecutor ← intercepts load_skill, run_skill_script, read_skill_resource
└─ CompositeToolExecutor ← routes to MCP backends (remix-browser, plugins)
All components implement the ToolExecutor trait, making every layer independently testable with mocks. The LlmProvider trait abstracts the LLM HTTP client for the same reason.
src/
├── main.rs # CLI entry point, decorator chain wiring
├── cli.rs # Argument parsing (clap)
├── lib.rs # Public module re-exports
├── error.rs # Error types and exit codes
├── agent/
│ ├── loop_impl.rs # Core agent loop (AgentRunner)
│ ├── state.rs # Message history + step recording
│ ├── compaction.rs # Context compaction logic
│ ├── compaction_prompt.rs # Compaction system prompt
│ ├── loop_detection.rs # Exact-match + semantic loop detection
│ ├── reasoning_stages.rs # Adaptive thinking budget phases
│ ├── reminders.rs # Action reminder injection
│ ├── lsp_tools.rs # LSP integration (dev tools)
│ ├── test_harness.rs # Test framework detection + execution
│ └── repo_map.rs # Codebase structure via tree-sitter
├── agents_md/
│ ├── mod.rs # Public API re-exports
│ └── discovery.rs # AGENTS.md walk + injection
├── browser/
│ ├── mcp.rs # MCP client + ToolExecutor trait definition
│ ├── manager.rs # Browser process lifecycle
│ └── convert.rs # MCP → Anthropic schema conversion
├── config/
│ ├── mod.rs # Config merging (CLI > env > YAML > defaults)
│ ├── schema.rs # AppConfig, LlmConfig, PluginsConfig, etc.
│ ├── credentials.rs # Credential adapter (RawCredential → CredentialSet)
│ └── env.rs # ${VAR} interpolation
├── coordination/
│ ├── mod.rs # Public API re-exports
│ ├── context.rs # CoordinationContext (shared state)
│ ├── executor.rs # CoordinationExecutor decorator
│ ├── shared_executor.rs # SharedToolExecutor for worker agents
│ ├── task_types.rs # Task, TaskStatus, TaskId
│ ├── task_store.rs # TaskStore (RwLock + file persistence)
│ ├── team_types.rs # Team, TeamId, WorkerInfo
│ ├── team_store.rs # TeamStore (RwLock + file persistence)
│ ├── inbox_types.rs # InboxMessage, InboxId
│ └── inbox_store.rs # InboxStore (RwLock + file persistence)
├── llm/
│ ├── client.rs # Anthropic HTTP client with retry
│ └── types.rs # Message, ContentBlock, ToolDefinition
├── local_tools/
│ ├── mod.rs # Public API re-exports
│ ├── executor.rs # LocalToolsExecutor decorator
│ ├── sandbox/
│ │ ├── mod.rs # BashSandbox trait + factory
│ │ ├── path_validator.rs # Sandbox path enforcement
│ │ ├── seatbelt.rs # macOS sandbox-exec wrapper
│ │ └── landlock.rs # Linux Landlock LSM wrapper
│ └── tools/
│ ├── mod.rs # Tool module re-exports
│ ├── read_file.rs # read_file tool
│ ├── write_file.rs # write_file tool
│ ├── edit_file.rs # edit_file tool
│ ├── bash.rs # bash tool
│ ├── grep.rs # grep tool
│ ├── glob_tool.rs # glob tool
│ ├── web_fetch.rs # web_fetch tool
│ └── output_filter.rs # Shared truncation + ANSI stripping
├── output/
│ ├── result.rs # AgentResult, StepRecord
│ ├── webhook.rs # Webhook dispatcher
│ ├── events.rs # Event bus (broadcast channel)
│ └── sse_server.rs # Axum SSE server (optional)
├── permissions/
│ ├── mod.rs # Public re-exports
│ ├── types.rs # PermissionMode, PermissionPolicy
│ └── executor.rs # PermissionAwareExecutor decorator
├── plugins/
│ ├── mod.rs # Public re-exports
│ ├── types.rs # PluginSet, ResolvedPlugin, PluginComponents
│ ├── discovery.rs # discover_all_plugins, resolve_local_dir
│ ├── github.rs # Git clone/update for GitHub plugins
│ ├── composite_executor.rs # CompositeToolExecutor (multi-backend routing)
│ ├── hook_executor.rs # HookAwareExecutor decorator
│ └── components/
│ ├── skills.rs # merge_plugin_skills into SkillSet
│ ├── hooks.rs # HookRegistry, hooks.json parsing
│ ├── agents.rs # Agent .md parsing + system prompt injection
│ └── mcp.rs # Plugin MCP server configuration
├── session/
│ ├── mod.rs # Public re-exports
│ ├── types.rs # SessionId, SessionMetadata, SessionSnapshot
│ └── store.rs # SessionStore (create, load, fork, append)
├── skills/
│ ├── mod.rs # Public API re-exports
│ ├── discovery.rs # Skill discovery + SKILL.md parsing
│ ├── executor.rs # SkillAwareExecutor decorator
│ └── types.rs # SkillSet, SkillEntry, SkillMetadata
└── subagent/
├── mod.rs # Public re-exports
├── types.rs # SubagentDefinition, SpawnRequest
├── executor.rs # SubagentExecutor decorator
└── filtered_executor.rs # FilteredToolExecutor (regex tool filtering)
MIT