0.7.0
What's Changed
- Update screenshot by @xingyaoww in #2286
- Bump openai from 1.30.5 to 1.31.0 by @dependabot in #2283
- [feat] WebArena benchmark, MiniWoB++ benchmark and related arch changes by @frankxu2004 in #2170
- fix: test_sandbox tests didn't close dockers by @tobitege in #2274
- [Hotfix] Fix ML-Bench continue
run_inference.py
by @super-dainiu in #2284 - Bump boto3 from 1.34.118 to 1.34.119 by @dependabot in #2280
- Update AgentHub README.md by @isavita in #2290
- doc: add Python keyring to Troubleshooting documentation by @tobitege in #2289
- Bump openai from 1.31.0 to 1.31.2 by @dependabot in #2301
- Bump litellm from 1.40.2 to 1.40.4 by @dependabot in #2300
- Bump ruff from 0.4.7 to 0.4.8 by @dependabot in #2297
- Bump boto3 from 1.34.119 to 1.34.120 by @dependabot in #2299
- Makefile setup-config to store the persist_sandbox boolean value by @mohammadkazem-sadoughi in #2304
- Bump tailwindcss from 3.4.3 to 3.4.4 in /frontend by @dependabot in #2298
- fix: ExplorerActions overlapping with file name. by @birajsilwal in #2287
- CodeActAgent to delegate to BrowsingAgent by @li-boxuan in #2103
- [bugfix] browse actions shouldn't change url and screenshot, only observations by @frankxu2004 in #2311
- Fix failed test_browse_internet CodeActAgent integration prompts by @yufansong in #2318
- tests: more Agentskills tests; updated .gitignore by @tobitege in #2307
- Fix python environment in solve-issue dogfood action by @yufansong in #2313
- Bump openai from 1.31.2 to 1.32.0 by @dependabot in #2317
- Bump boto3 from 1.34.120 to 1.34.121 by @dependabot in #2316
- Bump vite from 5.2.12 to 5.2.13 in /frontend by @dependabot in #2315
- fix: hide special paths; sort models by @tobitege in #2325
- fix: remove bottom chatbox fade (frontend) by @tobitege in #2323
- Add back jupyter PWD env var for agentskills by @xingyaoww in #2327
- feat: support ToolQA benchmark by @yueqis in #2263
- feat: revert hiden special paths change in file action by @yufansong in #2328
- Support gpqa benchmark evaluation by @1jsingh in #2080
- fix(frontend): prevent API key from resetting after modal change by @tobitege in #2329
- fix: codeact bug [If running a command that never returns, it gets stuck #1895] by @assertion in #2034
- Feat: Support Gorilla APIBench by @yueqis in #2081
- Remote deprecated file by @yufansong in #2332
- fix: Backticks get always escaped by runtime; add Ipython test by @tobitege in #2321
- fix: warning about zope-interface (pyproject) by @tobitege in #2335
- Revamp AgentRejectAction and allow ManagerAgent to handle rejection by @li-boxuan in #1735
- Downgraded Python version to 3.12.3 by @SmartManoj in #2331
- remove deprecated github-token config by @enyst in #2334
- Restore previous browsing agent behavior when evaluating on WebArena and miniwob++ only by @frankxu2004 in #2341
- doc: Added citation subsection in README by @poudel-bibek in #2339
- Refactored prompt.py to reduce token usage by @temotskipa in #1996
- Parameterize Python version by @ohhmm in #2348
- fix typos by @RainRat in #2352
- fix: remove backtick escaping from run_ipython by @tobitege in #2347
- Issues Category Update: Removed Question Type by @SmartManoj in #2345
- conftest: Exit without revealing secrets by @li-boxuan in #2351
- BioCoder integration by @tangxiangru in #2076
- Refactor response to action in agent step by @yufansong in #2350
- fix: remove some MonologueAgent mentions by @tobitege in #2364
- chore(deps): bump litellm from 1.40.4 to 1.40.7 by @dependabot in #2370
- chore(deps): bump boto3 from 1.34.121 to 1.34.122 by @dependabot in #2372
- chore(deps-dev): bump openai from 1.32.0 to 1.33.0 by @dependabot in #2373
- chore(deps-dev): bump llama-index-embeddings-azure-openai from 0.1.9 to 0.1.10 by @dependabot in #2374
- chore(deps-dev): bump llama-index-vector-stores-chroma from 0.1.8 to 0.1.9 by @dependabot in #2375
- Fix llm key leaks bug by @yufansong in #2376
- chore(deps): bump @vitejs/plugin-react from 4.3.0 to 4.3.1 in /frontend by @dependabot in #2371
- feat: append_file incl. all tests [agentskills] by @tobitege in #2346
- Add SWEBench-docker eval by @xingyaoww in #2085
- fix: avoid repeat logging of unneeded messages by @tobitege in #2380
- Minor SWE-Bench inference config tweak by @xingyaoww in #2381
- fix(swe_bench_eval): Mkdir
infer_logs
instead oflogs
by @xingyaoww in #2382 - refactor browsing agent response parse by @yufansong in #2366
- chore(deps): bump boto3 from 1.34.122 to 1.34.123 by @dependabot in #2391
- chore(deps): bump litellm from 1.40.7 to 1.40.8 by @dependabot in #2392
- chore(deps-dev): bump prettier from 3.3.1 to 3.3.2 in /frontend by @dependabot in #2390
- chore(deps-dev): bump @typescript-eslint/eslint-plugin from 7.12.0 to 7.13.0 in /frontend by @dependabot in #2389
- chore(deps-dev): bump @testing-library/jest-dom from 6.4.5 to 6.4.6 in /frontend by @dependabot in #2388
- chore(deps-dev): bump @typescript-eslint/parser from 7.12.0 to 7.13.0 in /frontend by @dependabot in #2387
- Add integration test for CodeActSWEAgent by @yufansong in #2377
- fix the failed unit test. by @iFurySt in #2405
- chore(deps-dev): bump lint-staged from 15.2.5 to 15.2.6 in /frontend by @dependabot in #2407
- chore(deps): bump litellm from 1.40.8 to 1.40.9 by @dependabot in #2411
- chore(deps): bump boto3 from 1.34.123 to 1.34.124 by @dependabot in #2410
- Use LLM to analyze ML-Bench failure cases by @super-dainiu in #2399
- Refactor MonologueAgent, PlannerAgent add response parser by @yufansong in #2400
- Refactor CodeActSWEAgent, add response parser by @yufansong in #2368
- Dockerfile for make plugins sandbox-agnostic by @yufansong in #2409
- chore(deps-dev): bump lint-staged from 15.2.6 to 15.2.7 in /frontend by @dependabot in #2414
- chore(deps): bump vite from 5.2.13 to 5.3.0 in /frontend by @dependabot in #2416
- chore(deps-dev): bump openai from 1.33.0 to 1.34.0 by @dependabot in #2422
- chore(deps): bump boto3 from 1.34.124 to 1.34.125 by @dependabot in #2423
- chore(deps): bump datasets from 2.19.2 to 2.20.0 by @dependabot in #2424
- Replace all instances of OPENDEVIN_WORKSPACE with WORKSPACE_BASE by @neubig in #2418
- workspace_mount_path sentinel: an undefined string by @enyst in #2431
- regenerate.sh: Exit upon common known errors by @li-boxuan in #2385
- Adjust is-stuck check for the same steps to 3 until it's stopped by @enyst in #2437
- chore(deps): bump litellm from 1.40.9 to 1.40.12 by @dependabot in #2440
- chore(deps): bump boto3 from 1.34.125 to 1.34.126 by @dependabot in #2439
- chore(deps): bump vite from 5.3.0 to 5.3.1 in /frontend by @dependabot in #2441
- refactor browsing agent code by @yufansong in #2442
- chores: fix DelegatorAgent description by @yufansong in #2446
- Stopped persisted container on closing to prevent port issues. by @SmartManoj in #2447
- chore: remove useless browsing code in CodeActSWEAgent by @yufansong in #2438
- Added Pull Request Template by @SmartManoj in #2454
- Fixed typo in PR template name by @SmartManoj in #2461
- Evaluation time travel: allow evaluation on a specific version by @li-boxuan in #2356
- fix: improve toml parsing exception in config class by @tobitege in #2459
- docs: Update Development and CONTRIBUTING docs by @mamoodi in #2453
- chores: remove useless code by @yufansong in #2465
- Detailed jupyter error log by @SmartManoj in #2448
- fix: Agentskills enhancements by @tobitege in #2384
- Codecov after_n_builds=5 by @neubig in #2468
- fix: logger with more masking of sensitive data by @tobitege in #2470
- fix: test_ipython being skipped by @tobitege in #2477
- Reworded port forward msg by @SmartManoj in #2478
- Integration tests: check agent error and fix test_edits by @li-boxuan in #2473
- Bump docker version by @SmartManoj in #2479
- chore(deps-dev): bump flake8 from 7.0.0 to 7.1.0 by @dependabot in #2481
- chore(deps): bump litellm from 1.40.12 to 1.40.15 by @dependabot in #2484
- chore(deps): bump tenacity from 8.3.0 to 8.4.1 by @dependabot in #2483
- chore(deps): bump @nextui-org/react from 2.4.1 to 2.4.2 in /frontend by @dependabot in #2485
- chore(deps-dev): bump ruff from 0.4.8 to 0.4.9 by @dependabot in #2482
- feat: add environments in global bashrc file by @Shimada666 in #2486
- chore(deps): bump jose from 5.4.0 to 5.4.1 in /frontend by @dependabot in #2496
- chore(deps-dev): bump @typescript-eslint/eslint-plugin from 7.13.0 to 7.13.1 in /frontend by @dependabot in #2495
- chore(deps-dev): bump @types/node from 20.14.2 to 20.14.5 in /frontend by @dependabot in #2498
- chore(deps): bump google-generativeai from 0.6.0 to 0.7.0 by @dependabot in #2500
- Update SWE-Bench README.md by @xingyaoww in #2505
- chore(deps): bump litellm from 1.40.15 to 1.40.16 by @dependabot in #2501
- Split container image build & push by @li-boxuan in #2456
- chore(deps): bump boto3 from 1.34.126 to 1.34.128 by @dependabot in #2504
- Document, rename Agent* exceptions to LLM* by @enyst in #2508
- Use the completion decorator with cost logging by @enyst in #2509
- dep: poetry.lock updated (lots of changes) by @tobitege in #2492
- feat: default timezone in build by @tobitege in #2513
- Fix Docker tagging issue with upper case by @li-boxuan in #2512
- chore(deps-dev): bump @typescript-eslint/parser from 7.13.0 to 7.13.1 in /frontend by @dependabot in #2497
- Add dev config for frontend for WSL by @SmartManoj in #2506
- Show relevant error in UI by @SmartManoj in #2516
- chore(deps-dev): bump chromadb from 0.5.1 to 0.5.3 by @dependabot in #2514
- chore(deps): bump json-repair from 0.23.1 to 0.25.0 by @dependabot in #2521
- chore(deps): bump boto3 from 1.34.128 to 1.34.129 by @dependabot in #2520
- chore(deps): bump litellm from 1.40.16 to 1.40.17 by @dependabot in #2522
- chore(deps): bump framer-motion from 11.2.10 to 11.2.11 in /frontend by @dependabot in #2523
- Update CONTRIBUTING.md by @neubig in #2525
- chore(deps-dev): bump eslint-plugin-react from 7.34.2 to 7.34.3 in /frontend by @dependabot in #2524
- docs: Fix formatting in CONTRIBUTING by @mamoodi in #2526
- docs: Update link to evaluation benchmark on README.md by @xingyaoww in #2530
- docs: Add visualizer instruction for SWE-Bench by @xingyaoww in #2529
- Make plugins sandbox-agnostic by @Shimada666 in #2101
- Downgrade Mac version in CI/CD Pipeline by @SmartManoj in #2499
- remove gcc by @Shimada666 in #2536
- Architecture documentation by @rbren in #2116
- chore(deps): bump monaco-editor from 0.49.0 to 0.50.0 in /frontend by @dependabot in #2541
- chore(deps): bump boto3 from 1.34.129 to 1.34.130 by @dependabot in #2546
- chore(deps-dev): bump @types/node from 20.14.5 to 20.14.6 in /frontend by @dependabot in #2542
- Fix Mac OS CI test by @SmartManoj in #2544
- add workspace text by @Shimada666 in #2548
- Fix od_sandbox's docker image rename rule by @xingyaoww in #2550
- chore(deps-dev): bump openai from 1.34.0 to 1.35.1 by @dependabot in #2547
- Always pull sandbox image by @SmartManoj in #2538
- Evaluation time travel: build sandbox on the fly by @li-boxuan in #2491
- Enable test_agnostic_sandbox_jupyter_agentskills_fileop_pwd in CI by @li-boxuan in #2534
- sec: update npm module "ws" to 8.17.1 by @tobitege in #2554
- Use :main instead of :latest by @SmartManoj in #2539
- chore(deps-dev): bump eslint-plugin-jsx-a11y from 6.8.0 to 6.9.0 in /frontend by @dependabot in #2543
- Fix Mac OS CI - usernet unable to resolve IP for SSH forwarding by @SmartManoj in #2556
- Enforce linter in tests folder by @li-boxuan in #2557
- Stop always pulling the latest image. by @Shimada666 in #2558
- Revert "Always pull sandbox image" by @Shimada666 in #2560
- fix: Makefile shall pull sandbox:main, not :latest by @tobitege in #2561
- Update doc to clarify OpenDevin mission and link directly to content by @neubig in #2568
- chore(deps-dev): bump typescript from 5.4.5 to 5.5.2 in /frontend by @dependabot in #2570
- chore(deps-dev): bump @types/node from 20.14.6 to 20.14.7 in /frontend by @dependabot in #2571
- chore(deps-dev): bump streamlit from 1.35.0 to 1.36.0 by @dependabot in #2572
- chore(deps-dev): bump openai from 1.35.1 to 1.35.3 by @dependabot in #2574
- chore(deps): bump json-repair from 0.25.0 to 0.25.1 by @dependabot in #2575
- No longer chown -R the miniforge3 folder by @Shimada666 in #2566
- Fix Mac CI Test by @SmartManoj in #2569
- Interactive Terminal by @SmartManoj in #2493
- chore(deps): bump litellm from 1.40.17 to 1.40.20 by @dependabot in #2576
- chore(deps-dev): bump ruff from 0.4.9 to 0.4.10 by @dependabot in #2573
- Add links to a feedback sharing site by @neubig in #2580
- Update feedback modal content by @amanape in #2582
- Mention Ubuntu supported versions by @SmartManoj in #2584
- Enable "vz" vm-type for MacOS CI by @SmartManoj in #2586
- Update documentation regarding feedback data usage by @neubig in #2585
- Revert "Enable "vz" vm-type for MacOS CI" by @SmartManoj in #2588
- Remove PERSIST_SANDBOX=true option by @mamoodi in #2591
- Add NUM_WORKERS variable to run_infer.sh scripts for configurable woker settings by @neubig in #2597
- Remove Colima and lima directory after uninstalling for Mac OS CI by @SmartManoj in #2598
- fix(frontend): Prevent actions from disappearing before sending data by @amanape in #2599
- fix(frontend): Disable eslint rule that throws
useEffect
warning by @amanape in #2600 - fix(frontend): Replace console errors with toast errors by @amanape in #2601
- Add i18n support for official website by @Umpire2018 in #2463
- Default makefile for persist_sandbox to be false by @mamoodi in #2605
- [Evaluation] Update SWE-bench output with eval results by @xingyaoww in #2606
- Track metrics throughout delegation & Polish UX for out of budget error by @li-boxuan in #2595
- Tweak prompts of ManagerAgent and CommitWriterAgent by @li-boxuan in #2609
- feat: update version to 0.6.2. added Action to update pyproject on Release by @tobitege in #2552
- chore(deps): bump boto3 from 1.34.130 to 1.34.131 by @dependabot in #2616
- chore(deps): bump litellm from 1.40.20 to 1.40.25 by @dependabot in #2615
- chore(deps-dev): bump @types/node from 20.14.7 to 20.14.8 in /frontend by @dependabot in #2617
- Bug fix: add error observation to history by @li-boxuan in #2610
- feat(frontend): Add "Copy" Button to Chat Messages by @PierrunoYT in #2619
- feat: Agent buttons hover decor by @tobitege in #2623
- chore(deps): bump framer-motion from 11.2.11 to 11.2.12 in /frontend by @dependabot in #2626
- chore(deps-dev): bump @typescript-eslint/parser from 7.13.1 to 7.14.1 in /frontend by @dependabot in #2627
- chore(deps-dev): bump mypy from 1.10.0 to 1.10.1 by @dependabot in #2634
- chore(deps): bump litellm from 1.40.25 to 1.40.26 by @dependabot in #2633
- chore(deps-dev): bump @typescript-eslint/eslint-plugin from 7.13.1 to 7.14.1 in /frontend by @dependabot in #2629
- chore(deps): bump tenacity from 8.4.1 to 8.4.2 by @dependabot in #2631
- chore(deps-dev): bump reportlab from 4.2.0 to 4.2.2 by @dependabot in #2635
- chore(deps): bump boto3 from 1.34.131 to 1.34.132 by @dependabot in #2632
- feat(frontend): Add Typing Indicator for Agent Response using Tailwind CSS by @PierrunoYT in #2630
- frontend: apply more translations to frontend components by @tobitege in #2639
- feat: allow SANDBOX_CONTAINER_IMAGEs built from opendevin/sandbox:main by @xverges in #2622
- Frontend support for delegation and rejection by @li-boxuan in #2608
- dev: added make-i18n to "build" (package.json) by @tobitege in #2641
- chore(deps): bump react-router-dom from 6.23.1 to 6.24.0 in /frontend by @dependabot in #2628
- chore(deps-dev): bump @types/node from 20.14.8 to 20.14.9 in /frontend by @dependabot in #2644
- Added Documentation for How To Run Custom Sandbox Image by @sheunaluko in #2637
- chore(deps): bump jose from 5.4.1 to 5.5.0 in /frontend by @dependabot in #2645
- chore(deps): bump boto3 from 1.34.132 to 1.34.133 by @dependabot in #2647
- chore(deps): bump litellm from 1.40.26 to 1.40.27 by @dependabot in #2648
- chore(deps-dev): bump openai from 1.35.3 to 1.35.4 by @dependabot in #2649
- docs: Update custom_sandbox_guide.md by @xingyaoww in #2650
- Update docusaurus.config.ts for a changed Doc URL by @xingyaoww in #2653
- Added Sound Notification π΅ by @SmartManoj in #2203
- Add test for auto_lint after file edit by @li-boxuan in #2655
- Revert "Show relevant error in UI" by @enyst in #2657
- Provide [Package already installed] info to LLM by @SmartManoj in #2642
- chore(deps): bump boto3 from 1.34.133 to 1.34.134 by @dependabot in #2661
- chore(deps-dev): bump openai from 1.35.4 to 1.35.6 by @dependabot in #2660
- chore(deps): bump google-generativeai from 0.7.0 to 0.7.1 by @dependabot in #2662
- chore(deps): bump jose from 5.5.0 to 5.6.1 in /frontend by @dependabot in #2663
- chore(deps): bump litellm from 1.40.27 to 1.40.28 by @dependabot in #2664
- Fix doc error in evals by @Jiayi-Pan in #2654
- refactor: Simplify message formatting by @SmartManoj in #2670
- CodeActAgent: Fix delegate history by @li-boxuan in #2672
- chore(deps): bump boto3 from 1.34.134 to 1.34.135 by @dependabot in #2678
- chore(deps): bump jose from 5.6.1 to 5.6.2 in /frontend by @dependabot in #2682
- chore(deps): bump vite from 5.3.1 to 5.3.2 in /frontend by @dependabot in #2681
- chore(deps-dev): bump openai from 1.35.6 to 1.35.7 by @dependabot in #2676
- chore(deps): bump litellm from 1.40.28 to 1.40.29 by @dependabot in #2677
- chore(deps): bump json-repair from 0.25.1 to 0.25.2 by @dependabot in #2679
- chore(deps-dev): bump llama-index-vector-stores-chroma from 0.1.9 to 0.1.10 by @dependabot in #2680
- feat: file explorer: better sorting; .gitignore support; file upload config by @tobitege in #2621
- [Evaluation] Improve patch apply in SWE-Bench by @xingyaoww in #2684
- ghcr: Fix local built image name in tests by @li-boxuan in #2686
New Contributors
- @mohammadkazem-sadoughi made their first contribution in #2304
- @birajsilwal made their first contribution in #2287
- @yueqis made their first contribution in #2263
- @1jsingh made their first contribution in #2080
- @ohhmm made their first contribution in #2348
- @tangxiangru made their first contribution in #2076
- @xverges made their first contribution in #2622
- @sheunaluko made their first contribution in #2637
Full Changelog: 0.6.2...0.7.0