-
Notifications
You must be signed in to change notification settings - Fork 16
[BUGS] Handle double-escaped JSON in streaming tool calls #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[BUGS] Handle double-escaped JSON in streaming tool calls #70
Conversation
Adds logic to parse tool call arguments that are double-escaped JSON strings in both CohereProvider and GenericProvider. This ensures arguments are correctly deserialized before being passed to tool_call_chunk.
| if tool_id: | ||
| tool_call_ids.add(tool_id) | ||
|
|
||
| args = tool_call["function"].get("arguments") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will take more than this to fix this problem.
When langgraph tries to consume streaming chunks and tries to create a tool call, it will fail if the parsed string is not a json and it will create an invalid tool call much before the control comes to our code.
https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/messages/ai.py#L508-L522
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option is to extend AIMessageChunk and override init_tool_calls to do the double parsing ourselves. Make sure you use the new class in this file instead of AIMessageChunk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class OCIAIMessageChunk(AIMessageChunk):
@model_validator(mode="after")
def init_tool_calls(self) -> Self:
"""Initialize tool calls from tool call chunks.
Returns:
The values with tool calls initialized.
Raises:
ValueError: If the tool call chunks are malformed.
"""
if not self.tool_call_chunks:
if self.tool_calls:
self.tool_call_chunks = [
create_tool_call_chunk(
name=tc["name"],
args=json.dumps(tc["args"]),
id=tc["id"],
index=None,
)
for tc in self.tool_calls
]
if self.invalid_tool_calls:
tool_call_chunks = self.tool_call_chunks
tool_call_chunks.extend(
[
create_tool_call_chunk(
name=tc["name"], args=tc["args"], id=tc["id"], index=None
)
for tc in self.invalid_tool_calls
]
)
self.tool_call_chunks = tool_call_chunks
return self
tool_calls = []
invalid_tool_calls = []
def add_chunk_to_invalid_tool_calls(chunk: ToolCallChunk) -> None:
invalid_tool_calls.append(
create_invalid_tool_call(
name=chunk["name"],
args=chunk["args"],
id=chunk["id"],
error=None,
)
)
for chunk in self.tool_call_chunks:
try:
parsed_args = parse_partial_json(chunk["args"]) if chunk["args"] else {}
if isinstance(parsed_args, str):
parsed_args = parse_partial_json(parsed_args)
if isinstance(parsed_args, dict):
tool_calls.append(
create_tool_call(
name=chunk["name"] or "",
args=parsed_args,
id=chunk["id"],
)
)
else:
add_chunk_to_invalid_tool_calls(chunk)
except Exception:
add_chunk_to_invalid_tool_calls(chunk)
self.tool_calls = tool_calls
self.invalid_tool_calls = invalid_tool_calls
return self
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my experience, the OCI genai endpoint either returns a JSON or a double escaped JSON. The PR code handles both situations:
- Normal JSON ('{"key": "value"}'): First parse succeeds → dict → second parse raises TypeError → keep original
- Double-escaped JSON ('"{"key": "value"}"'): First parse → string → second parse → dict → convert back to unescaped JSON
Then the result passed to LangChain will always be a valid JSON after LangChain parsed it by:
args_ = parse_partial_json(chunk["args"]) if chunk["args"] else {}
Problem
Streaming tool calls were not handling double-escaped JSON arguments, while the non-streaming path (convert_oci_tool_call_to_langchain) already had this fix. This caused tool call arguments to be incorrectly parsed in streaming mode.
Solution
Applied the same double-escape handling logic to process_stream_tool_calls in both CohereProvider and GenericProvider:
Logic:
Normal JSON ('{"key": "value"}'): First parse succeeds → dict → second parse raises TypeError → keep original
Double-escaped JSON ('"{"key": "value"}"'): First parse → string → second parse → dict → convert back to unescaped JSON
Invalid/empty JSON: First parse raises JSONDecodeError → keep original