Skip to content

Conversation

itsmeknt
Copy link

When running the Aider benchmark, sometimes it is useful to analyze the performance of the model according to the programming language. Some users may want to choose a model that do better specifically in Go, even though the overall benchmark score may be low.

I added some self-contained code to benchmark.py so that when you call benchmark.py --stats along with --verbose, it will print the benchmark stats broken down by each language at the bottom of the report. Without --verbose, the behavior is kept unchanged.

Here is an example:

./benchmark/benchmark.py --stats --verbose reports_from_benchmarks/gpt-oss-20b/medium/whole/2025-09-12-09-53-14--bench-full-whole-openai-openai-gpt-oss-20b-medium/

──────────────────────────────────────────── reports_from_benchmarks/gpt-oss-20b/medium/whole/2025-09-12-09-53-14--bench-full-whole-openai-openai-gpt-oss-20b-medium ─────────────────────────────────────────────- dirname: 2025-09-12-09-53-14--bench-full-whole-openai-openai-gpt-oss-20b-medium
  test_cases: 225
  model: openai/openai/gpt-oss-20b
  edit_format: whole
  commit_hash: 32faf82-dirty
  reasoning_effort: medium
  pass_rate_1: 9.8
  pass_rate_2: 36.0
  pass_num_1: 22
  pass_num_2: 81
  percent_cases_well_formed: 100.0
  error_outputs: 27
  num_malformed_responses: 0
  num_with_malformed_responses: 0
  user_asks: 154
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  prompt_tokens: 2162608
  completion_tokens: 1224921
  test_timeouts: 4
  total_tests: 225
  command: aider --model openai/openai/gpt-oss-20b
  date: 2025-09-12
  versions: 0.86.2.dev
  seconds_per_case: 801.2
  total_cost: 0.0000

costs: $0.0000/test-case, $0.00 total, $0.00 projected

======== Stats by language ========

| ---------------------------- | --------- | --------- | --------- | --------- | ---------- | --------- |
|                              |   python  |     go    |    rust   |    cpp    | javascript |    java   |
| ---------------------------- | --------- | --------- | --------- | --------- | ---------- | --------- |
| completed_tests              |        34 |        39 |        30 |        26 |         49 |        47 |
| duration                     | 24,957.62 | 21,706.71 | 17,028.67 | 51,506.41 |  29,789.68 | 35,275.56 |
| avg_duration_per_test        |    734.05 |    556.58 |    567.62 |  1,981.02 |     607.95 |    750.54 |
| cost                         |         - |         - |         - |         - |          - |         - |
| pass_rate_0                  |      5.88 |      5.13 |      6.67 |      7.69 |       4.08 |      4.26 |
| pass_rate_1                  |     35.29 |     30.77 |     40.00 |     46.15 |      24.49 |     25.53 |
| pass_num_0                   |         2 |         2 |         2 |         2 |          2 |         2 |
| pass_num_1                   |        12 |        12 |        12 |        12 |         12 |        12 |
| error_outputs                |         7 |         2 |         3 |         - |         14 |         1 |
| user_asks                    |         1 |         1 |         - |       139 |          - |        13 |
| test_timeouts                |         - |         - |         1 |         - |          2 |         1 |
| exhausted_context_windows    |         - |         - |         - |         - |          - |         - |
| num_malformed_responses      |         - |         - |         - |         - |          - |         - |
| num_with_malformed_responses |         - |         - |         - |         - |          - |         - |
| syntax_errors                |         - |         - |         - |         - |          - |         - |
| indentation_errors           |         - |         - |         - |         - |          - |         - |
| lazy_comments                |         - |         - |         - |         - |          - |         - |
| prompt_tokens                |   204,931 |   159,565 |   127,949 | 1,078,034 |    247,566 |   344,563 |
| completion_tokens            |   138,725 |   159,982 |   128,591 |   379,616 |    185,134 |   232,873 |
| ---------------------------- | --------- | --------- | --------- | --------- | ---------- | --------- |

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

ei-grad and others added 30 commits April 14, 2025 22:15
Co-authored-by: aider (vertex_ai/gemini-2.5-pro-exp-03-25) <[email protected]>
- Add tool_prompt to CoderPrompts class
- Modify fmt_system_prompt to include tool prompt when MCP tools are available
- This enables better handling of tool-based interactions when using MCP servers
@CLAassistant
Copy link

CLAassistant commented Sep 18, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ itsmeknt
✅ cryptekbits
❌ dwash96
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.