wenerme
diff --git a/‎notes/ai/dev/agent/agent-awesome.md‎
Lines changed: 10 additions & 0 deletions b/‎notes/ai/dev/agent/agent-awesome.md‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎notes/ai/dev/ai-dev-faq.md‎
Lines changed: 6 additions & 0 deletions b/‎notes/ai/dev/ai-dev-faq.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎notes/ai/dev/cuda.md‎
Lines changed: 41 additions & 0 deletions b/‎notes/ai/dev/cuda.md‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎notes/ai/dev/litellm/litellm-insight.md‎
Lines changed: 151 additions & 0 deletions b/‎notes/ai/dev/litellm/litellm-insight.md‎
Lines changed: 151 additions & 0 deletions
diff --git a/‎notes/ai/dev/mcp/README.md‎
Lines changed: 0 additions & 4 deletions b/‎notes/ai/dev/mcp/README.md‎
Lines changed: 0 additions & 4 deletions
diff --git a/‎notes/ai/dev/mcp/mcp-awesome.md‎
Lines changed: 11 additions & 8 deletions b/‎notes/ai/dev/mcp/mcp-awesome.md‎
Lines changed: 11 additions & 8 deletions
diff --git a/‎notes/ai/dev/mcp/mcp-insight.md‎
Lines changed: 85 additions & 0 deletions b/‎notes/ai/dev/mcp/mcp-insight.md‎
Lines changed: 85 additions & 0 deletions
diff --git a/‎notes/ai/dev/mcp/serena.md‎
Lines changed: 25 additions & 0 deletions b/‎notes/ai/dev/mcp/serena.md‎
Lines changed: 25 additions & 0 deletions
@@ -0,0 +1,10 @@
+---
+tags:
+- Awesome
+---
+
+# AI Agent Awesome
+
+- https://github.com/a2aproject/A2A
+- https://github.com/microsoft/autogen
+- https://github.com/DavidZWZ/Awesome-Deep-Research
@@ -0,0 +1,6 @@
+---
+tags:
+  - FAQ
+---
+
+# AI Dev FAQ
@@ -0,0 +1,41 @@
+---
+tags:
+  - FAQ
+---
+
+# FAQ
+
+## error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
+
+- 一般重启能解决
+- 如果不能重启，则尝试 unload 然后 reload
+
+```
+nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown
+```
+
+```bash
+nvidia-smi
+```
+
+```
+Failed to initialize NVML: Driver/library version mismatch
+NVML library version: 570.172
+```
+
+```bash
+# unload old
+lsmod | grep nvidia
+
+sudo rmmod nvidia_drm
+sudo rmmod nvidia_modeset
+sudo rmmod nvidia_uvm
+sudo rmmod nvidia
+
+sudo lsof /dev/nvidia*
+
+lsmod | grep nvidia
+
+# reload
+nvidia-smi
+```
@@ -0,0 +1,151 @@
+---
+tags:
+  - Inside
+---
+
+# LiteLLM Inside
+
+- 价格信息
+  - https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json
+
+| Header                                          | For                                        |
+| ----------------------------------------------- | ------------------------------------------ |
+| **Request**                                     |                                            |
+| `x-litellm-timeout: <seconds>`                  | 请求超时设置                               |
+| `x-litellm-stream-timeout: <seconds>`           | 第一个 Chunk 超时                          |
+| `x-litellm-enable-message-redaction: <boolean>` | 启用消息内容屏蔽                           |
+| `x-litellm-tags: <tag1,tag2,...>`               | 请求标签                                   |
+| `x-litellm-num-retries: <number>`               | 请求重试次数                               |
+| `x-litellm-spend-logs-metadata: <json>`         | 请求开销日志元数据                         |
+| **Request/Anthropic**                           |                                            |
+| `anthropic-version: <str>`                      | API version                                |
+| `anthropic-beta: <str>`                         | beta version                               |
+| **Request/OpenAI**                              |                                            |
+| `openai-organization: <str>`                    | organization id                            |
+| **Request/Bypass**                              |
+| `x-*`                                           | 需要配置 forward_client_headers_to_llm_api |
+| **Response/Rate Limit**                         |                                            |
+| `x-ratelimit-remaining-requests: <int>`         | 剩余可用请求数                             |
+| `x-ratelimit-remaining-tokens: <int>`           | 剩余可用token数                            |
+| `x-ratelimit-limit-requests: <int>`             | 最大请求数限制                             |
+| `x-ratelimit-limit-tokens: <int>`               | 最大token数限制                            |
+| `x-ratelimit-reset-requests: <int>`             | 请求限制重置时间                           |
+| `x-ratelimit-reset-tokens: <int>`               | token限制重置时间                          |
+| **Response/Latency**                            |                                            |
+| `x-litellm-response-duration-ms: <float>`       | 从请求到响应的总耗时(毫秒)                 |
+| `x-litellm-overhead-duration-ms: <float>`       | LiteLLM处理开销时间(毫秒)                  |
+| **Response/Retry&Fallback**                     |                                            |
+| `x-litellm-attempted-retries: <int>`            | 实际重试次数                               |
+| `x-litellm-attempted-fallbacks: <int>`          | 实际回退次数                               |
+| `x-litellm-max-fallbacks: <int>`                | 最大回退次数限制                           |
+| **Response/Cost**                               |                                            |
+| `x-litellm-response-cost: <float>`              | API调用费用                                |
+| `x-litellm-key-spend: <float>`                  | API密钥总消费                              |
+| **Response/Bypass**                             |                                            |
+| `llm_provider-*`                                | 透传LLM提供商的响应头                      |
+
+```json title="spend-logs-metadata"
+{ "user_id": "12345", "project_id": "proj_abc", "request_type": "chat_completion" }
+```
+
+## config.yaml
+
+```yaml
+include:
+  - model_config.yaml
+
+model_list: []
+litellm_settings:
+  num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
+  request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout
+  fallbacks: [{"zephyr-beta": ["gpt-4o"]}] # fallback to gpt-4o if call fails num_retries
+  context_window_fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo-16k"]}, {"gpt-4o": ["gpt-3.5-turbo-16k"]}] # fallback to gpt-3.5-turbo-16k if context window error
+  allowed_fails: 3 # cooldown model if it fails > 1 call in a minute.
+router_settings: # router_settings are optional
+  routing_strategy: simple-shuffle # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle"
+  model_group_alias: {"gpt-4": "gpt-4o"} # all requests with `gpt-4` will be routed to models with `gpt-4o`
+  num_retries: 2
+  timeout: 30                                  # 30 seconds
+  redis_host: <your redis host>                # set this when using multiple litellm proxy deployments, load balancing state stored in redis
+  redis_password: <your redis password>
+  redis_port: 1992
+general_settings: {}
+environment_variables: {}
+```
+
+```yaml
+model_list:
+  - model_name: glm-4.5
+    litellm_params:
+      model: openai/glm-4.5
+      litellm_credential_name: zhipu_credential
+
+  - model_name: glm-4.5-air
+    litellm_params:
+      model: openai/glm-4.5-air
+      litellm_credential_name: zhipu_credential
+
+  - model_name: '*'
+    litellm_params:
+      model: openai/glm-4.5-air
+      litellm_credential_name: zhipu_credential
+
+credential_list:
+  - credential_name: zhipu_credential
+    credential_values:
+      api_base: os.environ/ZHIPU_API_BASE
+      api_key: os.environ/ZHIPU_API_KEY
+    credential_info:
+      description: '智普'
+```
+
+**支持通配符**
+
+```yaml
+model_list:
+  - model_name: xai/*
+    litellm_params:
+      model: xai/*
+      api_key: os.environ/XAI_API_KEY
+
+litellm_settings:
+  check_provider_endpoint: true
+```
+
+```yaml
+# params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
+litellm_params:
+  model: openai/facebook/opt-125m
+  api_base: http://0.0.0.0:4000/v1
+  api_key: none
+  api_version: "2023-05-15"
+  rpm: 60      # Optional[int]: When rpm/tpm set - litellm uses weighted pick for load balancing. rpm = Rate limit for this deployment: in requests per minute (rpm).
+  tpm: 1000   # Optional[int]: tpm = Tokens Per Minute
+  azure_ad_token: ""
+  seed: 1234
+  max_token: 1024
+  temperature: 0.2
+  organization: "org-12345"
+  aws_region_name: "us-west-2"
+  extra_headers: {"AI-Resource Group": "ishaan-resource"}
+model_info:
+  version: 2
+  access_groups: ['restricted-models']
+  supported_environments: ["development", "production", "staging"]
+  custom_tokenizer:
+    identifier: deepseek-ai/DeepSeek-V3-Base
+    revision: main
+    auth_token: os.environ/HUGGINGFACE_API_KEY
+```
+
+- https://docs.litellm.ai/docs/proxy/configs
+- https://docs.litellm.ai/docs/proxy/config_settings
+
+## 参考
+
+- https://docs.litellm.ai/docs/proxy/request_headers
+- Anthropic
+  - Beta header
+    - https://docs.claude.com/en/api/beta-headers
+  - Features
+  - https://docs.claude.com/en/docs/build-with-claude/overview
@@ -37,10 +37,6 @@ tags:
   - eventStore: databaseEventStore
 - Local state with message routing - 需要本地维护状态，所有属于同一会话的请求需路由到同一节点。可通过消息队列和发布/订阅系统实现。
 
-# Spec
-
-- https://www.claudemcp.com/specification
-
 ## JetBrains
 
 - https://github.com/JetBrains/mcp-jetbrains
 
@@ -44,14 +44,17 @@ tags:
   - TODOIST_API_KEY
   - TODOIST_BASE_URL
   - add-projects, update-projects, delete-object
-- [Upstash/context7](https://github.com/upstash/context7)
-  - MIT, JS, TS
-  - Up-to-date code documentation for LLMs and AI code editors
-  - `use context7`
-  - MCP mcp.context7.com/mcp
-    - Header CONTEXT7_API_KEY
-  - MCP mcp.context7.com/sse
-  - API context7.com/api/v1
+- search/repo/index/doc
+  - [oraios/serena](https://github.com/oraios/serena)
+  - https://ref.tools/
+  - [Upstash/context7](https://github.com/upstash/context7)
+    - MIT, JS, TS
+    - Up-to-date code documentation for LLMs and AI code editors
+    - `use context7`
+    - MCP mcp.context7.com/mcp
+      - Header CONTEXT7_API_KEY
+    - MCP mcp.context7.com/sse
+    - API context7.com/api/v1
 - Index/Aggregate
   - https://modelscope.cn/mcp
   - [Dhravya/apple-mcp](https://github.com/Dhravya/apple-mcp)
 
@@ -0,0 +1,85 @@
+---
+tags:
+  - Insight
+  - Protocol
+---
+
+# MCP Insight
+
+```
+客户端                      服务端
+   |                          |
+   |--- GET /sse ------------>| (建立 SSE 流)
+   |                          |
+   |<--- SSE: endpoint -------| (返回 POST 端点 + sessionId)
+   |                          |
+   |--- POST /messages ------>| (发送消息，带 ?sessionId=)
+   |                          |
+   |<--- 202 Accepted --------| (确认接收)
+   |                          |
+   |<--- SSE: response -------| (通过 SSE 流返回响应)
+```
+
+```
+客户端                      服务端
+   |                          |
+   |--- POST /mcp ----------->| (初始化，包含 initialize 请求)
+   |                          |
+   |<--- SSE Stream --------->| (响应通过 SSE 返回，包含 Mcp-Session-Id header)
+   |    (text/event-stream)   |
+   |                          |
+   |--- POST /mcp ----------->| (后续请求，带 Mcp-Session-Id header)
+   |                          |
+   |<--- SSE Stream --------->| (每个 POST 都可能开启新的 SSE 流返回响应)
+   |                          |
+   |--- GET /mcp ------------>| (可选：建立独立 SSE 流接收服务器推送)
+   |<--- SSE Stream --------->|
+   |                          |
+   |--- DELETE /mcp --------->| (终止会话)
+```
+
+- StreamableHTTP - 2025-03-26
+- outputSchema
+  - 适用于 structureContent
+- structureContent - 2025-03-26
+- 旧版本返回结果为 `Array<{type:''}>`
+
+```json
+{
+  //  工具提示信息
+  "annotations": {
+    // 只读提示：工具不会修改环境
+    "readOnlyHint": false,
+    // 破坏性提示：是否执行破坏性更新
+    "destructiveHint": true,
+    // 幂等性提示：相同参数多次调用是否产生相同效果
+    "idempotentHint": true,
+    // 开放世界提示：是否与外部实体交互
+    "openWorldHint": false
+  }
+}
+```
+
+- Client Capabilities
+  - `roots`：提供文件系统根目录
+  - `sampling`：支持大模型采样请求
+  - `experimental`：支持实验性功能
+- Server Capabilities
+  - `prompts`：提供提示模板
+  - `resources`：提供可读资源
+  - `tools`：提供可调用工具
+  - `logging`：结构化日志
+  - `experimental`：实验性功能支持
+- 版本
+  - 2025-06-18
+  - 2025-03-26
+    - StreamableHTTP, outputSchema, structureContent, Mcp-Session-Id
+  - 2024-11-05
+    - SSE, seesionId
+  - 2024-10-07
+    - Tools、Resources、Prompt、content
+- Last-Event-ID
+
+---
+
+- https://www.claudemcp.com/specification
@@ -0,0 +1,25 @@
+---
+title: serena
+---
+
+# serena
+
+- [oraios/serena](https://github.com/oraios/serena)
+  - MIT, Python
+  - Serena 是一个强大的**编码代理工具包**，能够将 LLM 转换为功能完整的代理，直接在你的代码库上工作。
+  - 提供**语义代码检索和编辑工具**，类似于 IDE 的功能，在符号级别提取代码实体并利用关系结构。
+  - 主要特性：
+    - 基于语言服务器协议（LSP）的语义代码分析
+    - 支持多种编程语言：Python、TypeScript/JavaScript、PHP、Go、R、Rust、C/C++、Zig、C#、Ruby、Swift、Kotlin、Java、Clojure、Dart、Bash、Lua、Nix、Elixir、Erlang 等
+    - 提供 MCP（Model Context Protocol）服务
+  - 核心工具：
+    - `find_symbol` - 全局符号搜索
+    - `find_referencing_symbols` - 查找引用符号
+    - `insert_after_symbol` / `insert_before_symbol` - 在符号前后插入代码
+    - `replace_symbol_body` - 替换符号定义
+    - `get_symbols_overview` - 获取文件符号概览
+  - 典型用例：
+    - 增强现有编码代理的性能（如 Claude Code）
+    - 在大型复杂项目中精确导航和操作代码
+    - 提供 IDE 级别的代码理解和编辑能力
+    - 节省 token 使用并提高代码质量
-Original file line number
+Diff line change
@@ @@ -0,0 +1,6 @@ @@
 +---
 +tags:
 +  - FAQ
 +---
++
 +# AI Dev FAQ