Our new flagship app, introduced in episode 170.
-
Clone the repository:
git clone https://github.com/OpenAgentsInc/commander.git cd commander
-
Install pnpm (if you haven't already): pnpm is the package manager used for this project. If you don't have pnpm installed, you can install it using npm (which comes with Node.js):
npm install -g pnpm
For other installation methods (like Homebrew, or using a specific version), please refer to the official pnpm installation guide.
-
Install dependencies:
pnpm i
-
Run the development server:
pnpm start
This will automatically start both the app and the Claude Bridge Service. To run just the app without the bridge:
pnpm start:app-only
To chat, you must have Ollama running. The current chat model used is gemma3:1b
, which you can preload in Ollama by running ollama pull gemma3:1b
.
To use Claude Code as an AI provider, you need to:
- Install the Claude CLI:
npm install -g @anthropic-ai/cli
- Authenticate:
claude auth
The bridge service is automatically started when you run pnpm start
. Additional commands:
pnpm bridge
- Start bridge service onlypnpm bridge:stop
- Stop bridge servicepnpm start:app-only
- Start app without bridge service
If you plan to use or develop the SWE-Bench evaluation harness:
-
Ensure Docker is installed and running: Docker Desktop (for Mac/Windows) or Docker Engine (for Linux) must be installed and the Docker daemon must be running.
-
Prepare the SWE-Bench Base Docker Image: Commander's SWE-Bench harness dynamically builds a custom Docker image for each task instance. This process requires a base image that provides the core Python environment and tools. By default, Commander expects this base image to be named
swebench/swe-eval:latest
.To build this base image locally:
- Clone the official SWE-Bench repository:
git clone https://github.com/princeton-nlp/SWE-bench.git cd SWE-bench
- Build their base Docker image (often referred to as
sweb.base
):docker build -f dockerfiles/Dockerfile.base -t sweb.base .
- Tag this image so Commander can find it by the default name:
docker tag sweb.base swebench/swe-eval:latest
Alternatively, if you use a different name for your locally built base image (e.g.,
my-sweb-base:custom
), you must update Commander's configuration by setting theSWE_BENCH_BASE_IMAGE_NAME
in the configuration service or relevant environment variable to match your custom image name.Having this base image prepared locally will speed up the dynamic per-task image builds performed by the Commander harness.
- Clone the official SWE-Bench repository:
The project includes tools for running SWE-Bench task evaluations using official data from Hugging Face.
The easiest way to run evaluations is through the graphical interface. Press Ctrl+7 to open the Task Browser and get started.
📖 Full UI Guide: Running Evaluations with the UI
-
Python 3 and pip - Required for downloading task data
-
Python dependencies - Install with:
pip install datasets huggingface_hub
Note: Some Hugging Face datasets may require authentication. If you encounter issues, you may need to log in using the Hugging Face CLI:
huggingface-cli login
.As a dependency-light alternative for downloading tasks, you can use the
scripts/fetch_swebench_tasks.sh
shell script (requirescurl
andjq
). However, the Python script is recommended for full compatibility with all dataset features. -
Docker - Must be installed and running
-
SWE-Bench base image - Pull with:
docker pull swebench/swe-eval:latest
The SWE-bench dataset is already included in assets/swe_bench_data/
with 2,298 tasks. To update or download fresh data:
# Use the shell script to download tasks
./scripts/fetch_swebench_tasks.sh
# Or manually download specific datasets from Hugging Face
# Note: Requires Python with datasets and huggingface_hub installed
Use the evaluation runner to evaluate tasks:
# Run full evaluation with gold patches (reference implementation)
pnpm tsx scripts/run-swebench-evaluation.ts --patch_source gold
# Run evaluation with AI-generated patches (Claude)
pnpm tsx scripts/run-swebench-evaluation.ts --patch_source agent:claude_code --max_tasks 50
# Run with empty patches (baseline)
pnpm tsx scripts/run-swebench-evaluation.ts --patch_source empty --max_tasks 10
# Run specific tasks
pnpm tsx scripts/run-swebench-evaluation.ts --instance_ids "django__django-11099,sympy__sympy-12419"
# Monitor progress in real-time
pnpm tsx scripts/monitor-swebench-progress.ts
# Specify custom output directory
pnpm tsx scripts/run-swebench-evaluation.ts --output_dir ./my-results --max_tasks 5
Important for AI evaluation: When using --patch_source agent:claude_code
:
- Claude CLI must be installed:
npm install -g @anthropic-ai/cli
- Authenticate first:
claude auth
- Claude Code integration is automatic when using
pnpm start
Results are saved to ./docs/swebench-results/
with:
- Individual patch files for each task
progress.json
for real-time trackingsummary.json
with final statistics and success rates
For more details, see docs/swebench/running-swebench-tasks.md.
Commander includes comprehensive file-based logging for better observability:
-
Log Location:
<userDataPath>/logs/commander-run.log
- macOS:
~/Library/Application Support/commander/logs/commander-run.log
- Linux:
~/.config/commander/logs/commander-run.log
- Windows:
%APPDATA%\commander\logs\commander-run.log
- macOS:
-
Log Contents: Structured entries with timestamp, level, category, action, and context
-
Default Level: INFO (configurable to DEBUG for more verbose output)
-
SWE-Bench Operations: Docker builds, container execution, test results all logged
See docs/swebench/running-swebench-tasks.md for detailed logging information.
- Application Framework: Electron (~v35)
- Build Tool & Dev Server: Vite (~v6)
- UI Framework: React 19
- UI Components & Styling: Shadcn UI & Tailwind CSS v4
- Core Language: TypeScript (~v5.8)
- Routing: TanStack Router
- Data Fetching & State Management: React Query (TanStack Query)
- Internationalization (i18n): i18next
- Testing:
- Unit/Integration: Vitest
- End-to-End (E2E): Playwright
- Packaging & Distribution: Electron Forge
- Code Quality: ESLint (~v9) & Prettier
- Package Manager: pnpm
For a more comprehensive overview of the project architecture, specific configurations, and development guidelines, please refer to our Developer Onboarding Guide.