Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/kagi/ai/llm-benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The Kagi "offline" Benchmark is an **unpolluted benchmark** to assess large lang

Unlike standard benchmarks, the tasks in this benchmark are unpublished, not found in training data, or "gamed" in fine-tuning. The task set changes over time (mostly getting more difficult) to better represent the current state of the art.

Last task list revision: **October 8th, 2025**
Last update: **November 6th, 2025**
Tasks: **110**
Input Tokens (all tasks): **14909**

Expand Down Expand Up @@ -38,6 +38,7 @@ Please see notes below the table if you see results you find surprising, or get
| qwen3-next-80b-a3b-thinking | 66.7 | 1.0 | 58.2 | 442001 | 14.9 | openrouter |
| grok-4-fast-thinking | 66.1 | 0.3 | 8.2 | 289270 | 311.1 | kagi |
| arcee-ai/maestro-reasoning | 64.9 | 2.7 | 16.7 | 200565 | 103.4 | openrouter |
| kimi-k2-thinking | 64.4 | 0.8 | 47.4 | 338746 | 20.2 | kagi |
| qwen-plus-2025-07-28 | 63.3 | 1.1 | 9.0 | 143402 | 37.0 | openrouter |
| stepfun-ai/step3 | 62.3 | 1.6 | 174.2 | 417415 | 7.0 | openrouter |
| gpt-5-nano | 62.2 | 0.4 | 20.5 | 9587 | 3.9 | kagi |
Expand Down
2 changes: 1 addition & 1 deletion docs/kagi/ai/llms-privacy.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ When you use [Kagi Assistant](./assistant.md), we make API requests to third-par
| <ul><li>Grok 3 Mini</li><li>Grok 4</li><li>Grok 4 Fast</li><li>Grok 4 Fast with Thinking</li><li>Grok Code Fast 1</li></ul> | <ul><li>xAI</li></ul> | No | 30 days | <ul><li>[xAI Privacy Policy](https://x.ai/legal/privacy-policy)</li><li>[xAI FAQ](https://docs.x.ai/docs/faq#does-xai-train-on-customers-api-requests)</li></ul> |
| <ul><li>Kimi K2</li></ul> | <ul><li>Groq</li><li>Fireworks.ai</li></ul> | No | Not stored | <ul><li>[Groq Privacy Policy](https://groq.com/privacy-policy/)</li><li>[Fireworks.ai Privacy Policy](https://fireworks.ai/privacy-policy)</li><li>[Fireworks.ai FAQ](https://docs.fireworks.ai/guides/security_compliance/data_handling)</li></ul> |
| <ul><li>GPT OSS 20B</li><li>GPT OSS 120B</li></ul> | <ul><li>Groq</li><li>Fireworks.ai</li></ul> | No | Not stored | <ul><li>[Groq Privacy Policy](https://groq.com/privacy-policy/)</li><li>[Fireworks.ai Privacy Policy](https://fireworks.ai/privacy-policy)</li><li>[Fireworks.ai FAQ](https://docs.fireworks.ai/guides/security_compliance/data_handling)</li></ul> |
| <ul><li>GLM-4.5</li></ul> | <ul><li>Fireworks.ai</li></ul> | No | Not stored | <ul><li>[Fireworks.ai Privacy Policy](https://fireworks.ai/privacy-policy)</li><li>[Fireworks.ai FAQ](https://docs.fireworks.ai/guides/security_compliance/data_handling)</li></ul> |
| <ul><li>GLM-4.5</li><li>Kimi K2 Thinking</li></ul> | <ul><li>Fireworks.ai</li></ul> | No | Not stored | <ul><li>[Fireworks.ai Privacy Policy](https://fireworks.ai/privacy-policy)</li><li>[Fireworks.ai FAQ](https://docs.fireworks.ai/guides/security_compliance/data_handling)</li></ul> |

¹ The Assistant does not include a [unique user identifier](https://platform.openai.com/docs/guides/safety-best-practices#end-user-ids) for these requests.

Expand Down