Skip to content

Commit 2bf5c43

Browse files
authored
feat: support reasoning streaming (#83)
1 parent b0ad4ce commit 2bf5c43

File tree

6 files changed

+246
-104
lines changed

6 files changed

+246
-104
lines changed

API_DOCUMENTATION.md

Lines changed: 40 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,12 @@ curl -X POST http://localhost:3001/v1/agents/cairo-coder/chat/completions \
166166

167167
### Streaming Response
168168

169-
Set `"stream": true` to receive SSE chunks that match OpenAI's `chat.completion.chunk` format. Each SSE frame is emitted as `data: {JSON}\n\n`, ending with `data: [DONE]\n\n`.
169+
Set `"stream": true` to receive Server‑Sent Events (SSE). The stream contains:
170+
171+
- OpenAI‑compatible `chat.completion.chunk` frames for assistant text deltas
172+
- Cairo Coder custom event frames with a top‑level `type` and `data` field
173+
174+
Each frame is sent as `data: {JSON}\n\n`, and the stream ends with `data: [DONE]\n\n`.
170175

171176
**Request**
172177

@@ -193,23 +198,45 @@ data: {"id":"...","object":"chat.completion.chunk","created":1718123456,"model":
193198
data: [DONE]
194199
```
195200

196-
#### Sources Events (streaming-only)
201+
#### Custom Stream Events
197202

198-
In addition to the OpenAI-compatible chunks above, Cairo Coder emits a custom SSE frame early in the stream with the documentation sources used for the answer. This enables frontends to display sources while the model is generating the response.
203+
In addition to OpenAIcompatible chunks, Cairo Coder emits custom events to expose retrieval context, progress, and optional reasoning. These frames have the shape `{"type": string, "data": any}` and can appear interleaved with standard chunks.
199204

200-
- The frame shape is: `data: {"type": "sources", "data": [{"title": string, "url": string}, ...]}`
201-
- Clients should filter out objects with `type == "sources"` from the OpenAI chunks stream if they only expect OpenAI-compatible frames.
205+
- `type: "processing"` — High‑level progress updates.
202206

203-
Example snippet:
207+
- Example frames:
208+
- `data: {"type":"processing","data":"Processing query..."}`
209+
- `data: {"type":"processing","data":"Retrieving relevant documents..."}`
210+
- `data: {"type":"processing","data":"Generating response..."}` (or `"Formatting documentation..."` in MCP mode)
204211

205-
```json
206-
data: {"type":"sources","data":[{"metadata":{"title":"Introduction to Cairo","url":"https://book.cairo-lang.org/ch01-00-getting-started.html"}}]}
207-
```
212+
- `type: "sources"` — Sources used to answer the query (emitted after retrieval).
213+
214+
- Shape: `data: {"type":"sources","data":[{"metadata":{"title": string, "url": string}}, ...]}`
215+
- Example:
216+
- `data: {"type":"sources","data":[{"metadata":{"title":"Introduction to Cairo","url":"https://book.cairo-lang.org/ch01-00-getting-started.html"}}]}`
217+
- Notes:
218+
- Typically one `sources` frame per request, shortly after retrieval completes.
219+
- `url` maps to `metadata.sourceLink` when available.
220+
221+
- `type: "reasoning"` — Optional model reasoning stream (token‑level), when available.
222+
223+
- Example frames:
224+
- `data: {"type":"reasoning","data":"Thinking about storage layout..."}`
225+
- Notes:
226+
- Emitted incrementally and may appear multiple times.
227+
- Not all models or modes include reasoning; absence is expected.
228+
229+
- `type: "final_response"` — Full, final answer text.
230+
- Example:
231+
- `data: {"type":"final_response","data":"<complete answer text>"}`
232+
- Notes:
233+
- Mirrors the final accumulated assistant content sent via OpenAI‑compatible chunks.
208234

209-
Notes:
235+
Client guidance:
210236

211-
- Exactly one sources event is typically emitted per request, shortly after retrieval completes.
212-
- The `url` field maps to the ingester `sourceLink` when available; otherwise it may be a best-effort `url` present in metadata.
237+
- If you only want OpenAI‑compatible frames, ignore objects that include a top‑level `type` field.
238+
- To build richer UIs, render `processing` as status badges, `sources` as link previews, and `reasoning` in a collapsible area.
239+
- Streaming errors surface as OpenAI‑compatible chunks that contain `"delta": {"content": "\n\nError: ..."}` followed by a terminating chunk and `[DONE]`.
213240

214241
### Agent Selection
215242

@@ -280,7 +307,7 @@ Setting either `mcp` or `x-mcp-mode` headers triggers **Model Context Protocol m
280307

281308
- Non-streaming responses still use the standard `chat.completion` envelope, but `choices[0].message.content` contains curated documentation blocks instead of prose answers.
282309
- Streaming responses emit the same SSE wrapper; the payloads contain the formatted documentation as incremental `delta.content` strings.
283-
- A streaming request in MCP mode also includes the same `{"type": "sources"}` event described above.
310+
- Streaming also includes custom events: `processing` (e.g., "Formatting documentation...") and `sources` as described in Custom Stream Events. A `final_response` frame mirrors the full final text.
284311
- MCP mode does not consume generation tokens (`usage.completion_tokens` reflects only retrieval/query processing).
285312

286313
Example non-streaming request:

python/src/cairo_coder/core/rag_pipeline.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -206,8 +206,14 @@ async def aforward_streaming(
206206
):
207207
if isinstance(chunk, dspy.streaming.StreamResponse):
208208
# Incremental token
209-
chunk_accumulator += chunk.chunk
210-
yield StreamEvent(type=StreamEventType.RESPONSE, data=chunk.chunk)
209+
# Emit thinking events for reasoning field, response events for answer field
210+
if chunk.signature_field_name == "reasoning":
211+
yield StreamEvent(type=StreamEventType.REASONING, data=chunk.chunk)
212+
elif chunk.signature_field_name == "answer":
213+
chunk_accumulator += chunk.chunk
214+
yield StreamEvent(type=StreamEventType.RESPONSE, data=chunk.chunk)
215+
else:
216+
logger.warning(f"Unknown signature field name: {chunk.signature_field_name}")
211217
elif isinstance(chunk, dspy.Prediction):
212218
# Final complete answer
213219
final_text = getattr(chunk, "answer", None) or chunk_accumulator

python/src/cairo_coder/core/types.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ class StreamEventType(str, Enum):
115115
PROCESSING = "processing"
116116
RESPONSE = "response"
117117
FINAL_RESPONSE = "final_response"
118+
REASONING = "reasoning"
118119
END = "end"
119120
ERROR = "error"
120121

0 commit comments

Comments
 (0)