Skip to content

Make chat context summarization action-aware#5099

Open
toubatbrian wants to merge 13 commits intomainfrom
brian/summarize-with-tools
Open

Make chat context summarization action-aware#5099
toubatbrian wants to merge 13 commits intomainfrom
brian/summarize-with-tools

Conversation

@toubatbrian
Copy link
Contributor

@toubatbrian toubatbrian commented Mar 13, 2026

Summary

Reworks ChatContext._summarize() so the LLM "sees" tool call inputs/outputs alongside conversation messages, producing summaries that capture knowledge gained from actions rather than only what was said in dialogue.

Key changes:

  • Render FunctionCall and FunctionCallOutput items into the summarization input as XML (<function_call>, <function_call_output>) so the summarizer LLM can distill action results into the summary.
  • Introduce MessageRenderable base class for items that can be converted to a ChatMessage representation (currently FunctionCall and FunctionCallOutput).
  • Split head/tail on self.items directly (not a filtered message list), preserving correct causal ordering of messages, tool calls, and handoffs in the tail.
  • Improve the summarization system prompt with XML format documentation.
  • Update TaskGroup to stop excluding items before summarization (exclude_function_call=False, exclude_handoff=False, etc.) so that _summarize receives the full context to kick off action aware summarization.

Future Todos

  • Include tasks-awareness into summarizations
  • Maybe handoff awareness??

Comparison: old vs new summarization

Full terminal output
cd /Users/toubatbrian/Documents/agents && uv run python examples/compare_summarize.py 2>&1

================================================================================
  ORIGINAL CHAT CONTEXT
================================================================================

  [system]
    You are SupportGPT, a customer service agent for Acme Audio. Gather identifying info
    first, then troubleshoot. Only promise replacements if the device is under warranty.
    Use the provided tools for order lookup, warranty checks, and RMA creation. If a
    return is required, hand off to ReturnsAgent for shipping label logistics.

  [user]
    Hi, I need help with an order I placed last week. The earbuds I got keep disconnecting
    and the left side sounds crackly.

  [assistant]
    I can help with that! First, could you share your full name and the email you used at
    checkout, so I can locate your order?

  [user]
    Sure — I'm Maya Chen, and I used [email protected].

  [assistant]
    Thanks, Maya. Do you also have the order number and approximate purchase date?

  [user]
    Order #LK-4821936. I bought them on October 7.

  [function_call] lookup_order({"order_number": "LK-4821936", "customer_email": "[email protected]"})

  [function_call_output] lookup_order → {"order_number":"LK-4821936","customer_name":"Maya Chen","items":[{"sku":"AC-EBD-PRO","name":"Acme Buds Pro","qty":1}],"...

  [assistant]
    I found your order LK-4821936 for Acme Buds Pro, delivered October 8. To check
    warranty and next steps, which device are you pairing with and what OS version?

  [user]
    iPhone 14 Pro, iOS 18.0.1.

  [assistant]
    Thanks. Have you tried any troubleshooting — resetting the buds, forgetting/re-pairing
    Bluetooth, or testing another device?

  [user]
    I tried forgetting and re-pairing twice. I also tested on my iPad and the left ear
    still crackles.

  [assistant]
    Understood. Any visible damage or signs of moisture? And when did it start?

  [user]
    No damage or moisture. It started the day after I received them — October 9.

  [function_call] check_warranty({"serial":"ACB-PRO-7F29D4","purchase_date":"2025-10-07"})

  [function_call_output] check_warranty → {"eligible":true,"warranty_expires":"2026-10-07"}...

  [assistant]
    This appears to be a hardware defect and you're under warranty until 2026-10-07. I can
    set up a free replacement. Could you confirm your shipping address and a contact
    number?

  [user]
    Ship to 2150 Grove St, Apt 4B, Oakland, CA 94612. Phone is (510) 555-0136.

  [function_call] create_rma({"order_number":"LK-4821936","serial":"ACB-PRO-7F29D4","reason":"left bud crackling / disconnects","customer":{"name":"Maya Chen","email":"[email protected]","phone":"(510) 555-0136","address":"2150 Grove St, Apt 4B, Oakland, CA 94612"}})

  [function_call_output] create_rma → {"rma_id":"RMA-90721","replacement_eta_days":2}...

  [agent_handoff] SupportGPT → ReturnsAgent

  [function_call] generate_return_label({"rma_id":"RMA-90721","email":"[email protected]"})

  [function_call_output] generate_return_label → {"label_url":"https://example.invalid/label/RMA-90721","due_in_days":14}...

  [assistant]
    All set! I've created RMA #RMA-90721 linked to order LK-4821936. You'll receive the
    prepaid return label and instructions at [email protected]. Please ship the
    defective pair within 14 days; your replacement will ship within 48 hours.

  Total items: 24
================================================================================

================================================================================
  OLD SUMMARIZATION (plain text, no tool awareness)
================================================================================

  [system]
    You are SupportGPT, a customer service agent for Acme Audio. Gather identifying info
    first, then troubleshoot. Only promise replacements if the device is under warranty.
    Use the provided tools for order lookup, warranty checks, and RMA creation. If a
    return is required, hand off to ReturnsAgent for shipping label logistics.

  [assistant] (SUMMARY)
    [history summary] Maya Chen has issues with her recently ordered Acme Buds Pro,
    specifically with connectivity and sound crackling in the left earbud. She purchased
    them on October 7, 2023, with order #LK-4821936. The issues began on October 9, and
    troubleshooting steps did not resolve the problem. Her device is an iPhone 14 Pro
    running iOS 18.0.1. The assistant confirmed it's a hardware defect and the product is
    under warranty until October 7, 2026. The next step is for Maya to provide her
    shipping address and contact number to proceed with a free replacement.

  [function_call] create_rma({"order_number":"LK-4821936","serial":"ACB-PRO-7F29D4","reason":"left bud crackling / disconnects","customer":{"name":"Maya Chen","email":"[email protected]","phone":"(510) 555-0136","address":"2150 Grove St, Apt 4B, Oakland, CA 94612"}})

  [function_call_output] create_rma → {"rma_id":"RMA-90721","replacement_eta_days":2}...

  [agent_handoff] SupportGPT → ReturnsAgent

  [function_call] generate_return_label({"rma_id":"RMA-90721","email":"[email protected]"})

  [function_call_output] generate_return_label → {"label_url":"https://example.invalid/label/RMA-90721","due_in_days":14}...

  [user]
    Ship to 2150 Grove St, Apt 4B, Oakland, CA 94612. Phone is (510) 555-0136.

  [assistant]
    All set! I've created RMA #RMA-90721 linked to order LK-4821936. You'll receive the
    prepaid return label and instructions at [email protected]. Please ship the
    defective pair within 14 days; your replacement will ship within 48 hours.

  Total items: 9
================================================================================

================================================================================
  NEW SUMMARIZATION (XML, action-aware)
================================================================================

  [system]
    You are SupportGPT, a customer service agent for Acme Audio. Gather identifying info
    first, then troubleshoot. Only promise replacements if the device is under warranty.
    Use the provided tools for order lookup, warranty checks, and RMA creation. If a
    return is required, hand off to ReturnsAgent for shipping label logistics.

  [assistant] (SUMMARY)
    <chat_history_summary> Maya Chen has an issue with her Acme Buds Pro earbuds, which
    she purchased on October 7, 2025. The earbuds, particularly the left one, have
    connectivity issues and a crackly sound. She's pairing them with an iPhone 14 Pro
    running iOS 18.0.1 and has already attempted some troubleshooting steps without
    success. There is no visible damage or moisture on the earbuds, and the issue began on
    October 9. The purchase is covered under warranty until October 7, 2026, and Maya is
    eligible for a free replacement. The assistant has asked for confirmation of Maya's
    shipping address and contact number to proceed with the replacement.
    </chat_history_summary>

  [user]
    Ship to 2150 Grove St, Apt 4B, Oakland, CA 94612. Phone is (510) 555-0136.

  [function_call] create_rma({"order_number":"LK-4821936","serial":"ACB-PRO-7F29D4","reason":"left bud crackling / disconnects","customer":{"name":"Maya Chen","email":"[email protected]","phone":"(510) 555-0136","address":"2150 Grove St, Apt 4B, Oakland, CA 94612"}})

  [function_call_output] create_rma → {"rma_id":"RMA-90721","replacement_eta_days":2}...

  [agent_handoff] SupportGPT → ReturnsAgent

  [function_call] generate_return_label({"rma_id":"RMA-90721","email":"[email protected]"})

  [function_call_output] generate_return_label → {"label_url":"https://example.invalid/label/RMA-90721","due_in_days":14}...

  [assistant]
    All set! I've created RMA #RMA-90721 linked to order LK-4821936. You'll receive the
    prepaid return label and instructions at [email protected]. Please ship the
    defective pair within 14 days; your replacement will ship within 48 hours.

  Total items: 9
================================================================================

Given a realistic support conversation with 3 tool calls (lookup_order, check_warranty, create_rma), an agent handoff, and a generate_return_label call, here's how keep_last_turns=1 summarization compares (view above full output):

Aspect Old (plain text) New (XML, action-aware)
Purchase year "2023" (HALLUCINATED!) "2025" (correct — from lookup_order output)
Warranty date Mentioned, but sourced only from assistant dialogue Correct, naturally absorbed from check_warranty data
Serial number Missing entirely Not in summary, but preserved in tail's create_rma call
Tool mention "The assistant confirmed it's a hardware defect" No tools mentioned — just states facts as knowledge ("The purchase is covered under warranty")
Tail ordering User message appears after function calls it triggered Causal order preserved — user msg before its tool calls

Structural ordering (tail correctness)

Both produce 9 items after summarization, but the tail composition differs:
Old — tail ordering is broken:

# Item
1 system message
2 summary (assistant)
3 create_rma FunctionCall
4 create_rma FunctionCallOutput
5 agent_handoff (SupportGPT → ReturnsAgent)
6 generate_return_label FunctionCall
7 generate_return_label FunctionCallOutput
8 user: "Ship to 2150 Grove St..."
9 assistant: "All set!..."

The user message (item 8) appears after the function calls it triggered (items 3–7). This happens because the old code computed the tail on a filtered list of only ChatMessage items, so function calls were not accounted for when choosing the split point. The result is a context where causality is inverted — the RMA was created before the user even provided their address.

New — causal ordering preserved:

# Item
1 system message
2 summary (assistant)
3 user: "Ship to 2150 Grove St..."
4 create_rma FunctionCall
5 create_rma FunctionCallOutput
6 agent_handoff (SupportGPT → ReturnsAgent)
7 generate_return_label FunctionCall
8 generate_return_label FunctionCallOutput
9 assistant: "All set!..."

The tail is computed on self.items directly, so the user message that caused the RMA creation sits before the function calls it triggered. Any downstream LLM consuming this context sees events in the order they actually happened.

Test plan

  • test_summarize_head_tail_split_basic — tail preserves last N turns, head is summarized
  • test_summarize_head_tail_split_with_renderables — FunctionCall/FunctionCallOutput in tail region are preserved with correct ordering
  • test_summarize_keep_last_turns_zero — everything summarized, no tail
  • test_summarize_preserves_structural_items — system messages and AgentHandoff survive
  • test_summarize_skips_when_not_enough_messages — early return when budget covers all messages
  • All existing tests pass (test_chat_ctx.py: 18/18)

@toubatbrian toubatbrian marked this pull request as ready for review March 13, 2026 02:19
@chenghao-mou chenghao-mou requested a review from a team March 13, 2026 02:19
devin-ai-integration[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

toubatbrian and others added 6 commits March 12, 2026 19:24
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Member

@theomonnom theomonnom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, just a few nits.

Also do you think we necessarily need xml? This seems harder than just doing something very simple

Comment on lines +296 to +302
class MessageRenderable(BaseModel, ABC):
"""Base class for chat context items that can be converted into a `ChatMessage`."""

@abstractmethod
def to_message(self) -> ChatMessage:
pass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this abstraction? It feels unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do you think we necessarily need xml? This seems harder than just doing something very simple

I don’t think XML is strictly necessary, but I’d still prefer it here. It makes the boundaries between sections much clearer for the model, and it also helps avoid cases where plain labels like User: / Assistant: could get confused with the actual message content.

For example, if a user message itself contains \n User: or \n Assistant:, a simple text format can break down pretty easily and user can easily do prompt injections on it. XML makes that structure more explicit and resilient.

It also seems to be a pretty standard prompting pattern, e.g. Claude Code does something similar. So to me it feels like a safer default, without really adding much downside.

@toubatbrian toubatbrian requested a review from theomonnom March 13, 2026 22:22
devin-ai-integration[bot]

This comment was marked as resolved.

toubatbrian and others added 3 commits March 13, 2026 15:31
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 7 additional findings in Devin Review.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants