Skip to content

Conversation

@0xMochan
Copy link
Contributor

@0xMochan 0xMochan commented Apr 1, 2025

multi_turn and PromptRequest

Supercedes #290 and #224

This PR expands on the Prompt trait by enabling configurable prompt methods. By tweaking Prompt and Chat to return IntoFuture instead of Future, agent can implement a specialized version that returns PromptRequest, a fluent type-state builder that implements IntoFuture, allowing for configurable, type-safe prompting.

Usage

let agent = client
    .agent(anthropic::CLAUDE_3_5_SONNET)
    .preamble("...")
    .build();

# existing usage still works
agent.prompt("how tall is michael jordan").await?;
agent.chat("how tall is michael jordan", vec![]).await?;

# new usage lets you work with existing chat histories
let mut chat_history = vec![];

agent
  .prompt("how tall is michael jordan")
  .with_history(&mut chat_history)
  .await?;

agent
  .prompt("Calculate 5 - 2 = ?. Describe the result to me.")
  .with_history(&mut chat_history)
  .multi_turn(20)
  .await?;

The main new introduction is the multi_turn method, as this configures the prompt to perform a loop to continuously call tools until the agent is satisfied. This also ensures the model always returns an agentic response at the end, instead of a raw tool response like in the earlier example.

411662289-d90539b6-b907-446e-a944-4785a6990541

image referenced from #290

Using .with_history will allow you to reference a vector of messages that you are borrowing allowing for multi-turn to append to it as need be. This allows for more natural usage patterns as well as better ordering of messages since multi-turn will ensure prompt, tool calls, and tool results get ordered correctly.

Caveats/Breaking

Because of the existing behavior of the Prompt trait, the normal usage w/o .multi_turn will still suffer from a lack of parallel tool calls AND return direct tool responses. This is due to things like extractors that rely on this. I presume this may change when we get full middleware tech.

Additionally, the Chat trait is still using owned chat_history which uses clones. This was difficult to change because adding borrow requirements to the trait broke every single usage of it very badly. Currently, we use Chat trait as an example of creating agent bundles or super agents via a struct of multiple tiny agents which can be used via the same interface. I presume these usage patterns would be replaced with middleware layer tech as the primary go to way of customizing multi-agent patterns.

This change is 100% client-side compatible. This change unfortunately slightly alters the type-signatures of Prompt and Chat traits which means anyone doing custom implementations of these traits will need adjustments (unless they were using #[allow(refining_impl_trait)]).

Another change has been made removing prompt from the CompletionRequest struct since actual providers don't differentiate the prompt from the latest message in chat_history. This allows us to actually order things properly in chat_history and also allowed me to remove the confusing CompletionRequest::prompt_with_documents. This has been swapped with CompletionRequest::normalized_documents which makes document handling more streamlined (they never get added to chat_history, to avoid duplication).

Open Questions

The PromptRequest builder typestate is able to encapsulate a lot of configuration for prompting, but the way tool loops are handled are still less than satisfactory. The normal method for it only exists to appease extractors (while middleware gets put together) so it feels odd to have default behavior still be less than satisfactory.

Should .multi_turn(1) be the default and extractors use a special bypass method for short_circuit or raw?

@piotrostr
Copy link
Contributor

piotrostr commented Apr 3, 2025

have a look at https://github.com/piotrostr/listen/blob/main/listen-kit/src/reasoning_loop/gemini.rs, I tried something similar as this PR but the way rig traits are structured makes it really difficult, higher-level struct makes things easier

A generic struct with public method stream and match arm for picking the model

Otherwise it gets very hectic

So instead of impl traits just have a wider ReasoningLoop struct that accepts any model: https://github.com/piotrostr/listen/blob/main/listen-kit/src/reasoning_loop/mod.rs

@cvauclair
Copy link
Contributor

@0xMochan what's the status of this PR?

@0xMochan 0xMochan marked this pull request as ready for review April 11, 2025 01:31
@0xMochan 0xMochan requested a review from cvauclair April 11, 2025 01:31
@0xMochan
Copy link
Contributor Author

@0xMochan what's the status of this PR?

Ready for review, i think there's something wrong with my docstrings not sure how to fix it

@joshua-mo-143
Copy link
Collaborator

@0xMochan what's the status of this PR?

Ready for review, i think there's something wrong with my docstrings not sure how to fix it

You might need to reference using crate::foo::Bar format if the struct links don't resolve.

The link should be evident on how to resolve.

@0xMochan 0xMochan requested a review from joshua-mo-143 April 16, 2025 18:41
// We use `UserContent::document` for those who handle it directly!
let messages = self
.documents
.iter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to this PR, can't wait for reasoning loops to be merged!

I tried this PR with AWS Bedrock and there is one subtle breaking change. The previous function prompt_with_context merged all documents into a single attachment and the new version creates a separate document for each. That wouldn't be a problem if models didn't have a hard limit for the number of attachments. For AWS Bedrock in particular, it's 5. aws doc
Not sure for other providers, but I think there are similar restrictions...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Presumably this can be handled in the bedrock integration module, since this is usually where provider-specific limitations are handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, great job! I really disliked how prompt_with_context worked in general so i wanted to find a better solution. I'll see if i can make a specific exception for bedrock or if you have a code suggestion, i'm all ears!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since all docs are TXT, can we just fuse them like before but wrap inside UserContent::document ?

...
        let messages = self
            .documents
            .iter()
            .map(|doc| doc.to_string())
            .collect::<Vec<_>>()
            .join(" | ");

        let message = UserContent::document(
            messages,
            Some(ContentFormat::String),
            Some(DocumentMediaType::TXT),
        );

        Some(Message::User {
            content: OneOrMany::one(message),
        })

I just tried that with bedrock and works as expected.

Copy link
Contributor

@cvauclair cvauclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! Couple comments but otherwise this is solid!

// We use `UserContent::document` for those who handle it directly!
let messages = self
.documents
.iter()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Presumably this can be handled in the bedrock integration module, since this is usually where provider-specific limitations are handled.

0xMochan and others added 2 commits April 17, 2025 15:36
Carlos contirbuted to the original spec of multi-tool calling

Co-authored-by: carlos-verdes <[email protected]>
@0xMochan 0xMochan requested a review from cvauclair April 18, 2025 23:48
Copy link
Contributor

@cvauclair cvauclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

@0xMochan
Copy link
Contributor Author

@cvauclair is it time to merge 👀

@0xMochan 0xMochan merged commit 2d45ad5 into main Apr 22, 2025
5 checks passed
@0xMochan 0xMochan deleted the fix/multiple-tool-calling branch April 22, 2025 20:38
@github-actions github-actions bot mentioned this pull request Apr 22, 2025
@byeblack
Copy link

byeblack commented Apr 22, 2025

Deepseek provider needs to be fixed to support the current pull, Its first argument should be call.id

completion::AssistantContent::tool_call(
&call.function.name,
&call.function.name,
call.function.arguments.clone(),
)

The current error result context is as follows:
response:

[
  {
    "index": 0,
    "id": "call_0_7d51f346-b324-4c7e-a328-d76b40a4cb4a",
    "type": "function",
    "function": {
      "name": "add",
      "arguments": "{\"x\": 15, \"y\": 25}"
    }
  },
  {
    "index": 1,
    "id": "call_1_b8844bf0-4431-4f13-a3c1-ca7644f17d11",
    "type": "function",
    "function": {
      "name": "subtract",
      "arguments": "{\"x\": 100, \"y\": 50}"
    }
  },
  {
    "index": 2,
    "id": "call_2_8e271cdb-4079-4639-bee5-875e4d8a4c2c",
    "type": "function",
    "function": {
      "name": "add",
      "arguments": "{\"x\": 10, \"y\": 10}"
    }
  }
]

The second request will use the wrong id

[
  {
    "content": "40",
    "role": "tool",
    "tool_call_id": "add"
  },
  {
    "content": "50",
    "role": "tool",
    "tool_call_id": "subtract"
  },
  {
    "content": "20",
    "role": "tool",
    "tool_call_id": "add"
  }
]

You will get an error:

{"error":{"message":"Duplicate value for 'tool_call_id' of add in message[3]","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}

The repaired request looks like this:

[
  {
    "content": "40",
    "role": "tool",
    "tool_call_id": "call_0_eabd8f36-c51b-4c54-8b9c-578c63347442"
  },
  {
    "content": "50",
    "role": "tool",
    "tool_call_id": "call_1_c4396b60-8971-48e3-9f10-507fc872a3bb"
  },
  {
    "content": "20",
    "role": "tool",
    "tool_call_id": "call_2_394f0078-2ed9-42b7-9121-dfd0431e6ac6"
  }
]

@joshua-mo-143
Copy link
Collaborator

Deepseek provider needs to be fixed to support the current pull, Its first argument should be call.id

completion::AssistantContent::tool_call(
&call.function.name,
&call.function.name,
call.function.arguments.clone(),
)

The current error result context is as follows: response:

[
  {
    "index": 0,
    "id": "call_0_7d51f346-b324-4c7e-a328-d76b40a4cb4a",
    "type": "function",
    "function": {
      "name": "add",
      "arguments": "{\"x\": 15, \"y\": 25}"
    }
  },
  {
    "index": 1,
    "id": "call_1_b8844bf0-4431-4f13-a3c1-ca7644f17d11",
    "type": "function",
    "function": {
      "name": "subtract",
      "arguments": "{\"x\": 100, \"y\": 50}"
    }
  },
  {
    "index": 2,
    "id": "call_2_8e271cdb-4079-4639-bee5-875e4d8a4c2c",
    "type": "function",
    "function": {
      "name": "add",
      "arguments": "{\"x\": 10, \"y\": 10}"
    }
  }
]

The second request will use the wrong id

[
  {
    "content": "40",
    "role": "tool",
    "tool_call_id": "add"
  },
  {
    "content": "50",
    "role": "tool",
    "tool_call_id": "subtract"
  },
  {
    "content": "20",
    "role": "tool",
    "tool_call_id": "add"
  }
]

You will get an error:

{"error":{"message":"Duplicate value for 'tool_call_id' of add in message[3]","type":"invalid_request_error","param":null,"code":"invalid_request_error"}}

The repaired request looks like this:

[
  {
    "content": "40",
    "role": "tool",
    "tool_call_id": "call_0_eabd8f36-c51b-4c54-8b9c-578c63347442"
  },
  {
    "content": "50",
    "role": "tool",
    "tool_call_id": "call_1_c4396b60-8971-48e3-9f10-507fc872a3bb"
  },
  {
    "content": "20",
    "role": "tool",
    "tool_call_id": "call_2_394f0078-2ed9-42b7-9121-dfd0431e6ac6"
  }
]

#414

@byeblack
Copy link

byeblack commented Apr 23, 2025

This is a series of problems. I'm not sure if I should open a separate issue to discuss these issues, so I'll write them here for now:
1. Multi-turn may fail when the user uses a non-English language (you will often get the following results)

<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>send_message_text
```json
{"message":"xxxxxxx"}
```<|tool▁call▁end|><|tool▁calls▁end|>
  1. Dynamic tools cannot switch between multi-turn contexts (when you have dozens of tools, they can't coordinate)
    3. Even with multi-turn set up, simple tasks still cannot be completed (I'm considering writing a simple complex example🤯), The same prompt only takes 2-3 requests to complete in Cherry Studio, but it can't complete the task in rig, I need to spend time to study it

Tested with DeepSeek-V3-0324

Note:
For issue 1, that's just a symptom of missing tools. For dynamic tools, the list is currently only sent on the first request. When using static tools, the tool list is always sent.

For issue 3, when I use static tools, it works fine. I found that Cherry Studio uses Prompt + custom parser to implement streaming tool calls (and works well), but does not implement non-streaming tool calls.😂

@joshua-mo-143
Copy link
Collaborator

joshua-mo-143 commented Apr 23, 2025

This is a series of problems. I'm not sure if I should open a separate issue to discuss these issues, so I'll write them here for now:

  1. Multi-turn may fail when the user uses a non-English language (you will often get the following results)
<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>send_message_text
```json
{"message":"xxxxxxx"}
```<|tool▁call▁end|><|tool▁calls▁end|>
  1. Dynamic tools cannot switch between multi-turn contexts (when you have dozens of tools, they can't coordinate)
  2. Even with multi-turn set up, simple tasks still cannot be completed (I'm considering writing a simple complex example🤯), The same prompt only takes 2-3 requests to complete in Cherry Studio, but it can't complete the task in rig, I need to spend time to study it

Tested with DeepSeek-V3-0324

Will be bringing this up internally so we can sync and move quickly on a further course of action, thank you again!

(In the meantime, if you're using main branch in your project, you might need to avoid multi-turn for now and use manual turns until a fix can get merged in)

@0xMochan
Copy link
Contributor Author

0xMochan commented Apr 23, 2025

This is a series of problems. I'm not sure if I should open a separate issue to discuss these issues, so I'll write them here for now: 1. Multi-turn may fail when the user uses a non-English language (you will often get the following results)

<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>send_message_text
```json
{"message":"xxxxxxx"}
```<|tool▁call▁end|><|tool▁calls▁end|>
  1. Dynamic tools cannot switch between multi-turn contexts (when you have dozens of tools, they can't coordinate)
    3. Even with multi-turn set up, simple tasks still cannot be completed (I'm considering writing a simple complex example🤯), The same prompt only takes 2-3 requests to complete in Cherry Studio, but it can't complete the task in rig, I need to spend time to study it

Tested with DeepSeek-V3-0324

Note: For issue 1, that's just a symptom of missing tools. For dynamic tools, the list is currently only sent on the first request. When using static tools, the tool list is always sent.

For issue 3, when I use static tools, it works fine. I found that Cherry Studio uses Prompt + custom parser to implement streaming tool calls (and works well), but does not implement non-streaming tool calls.😂

@byeblack
I'm trying to parse thru and re-produce the issues described here. Can you make an explicit new issue with steps so I can reproduce?

@byeblack
Copy link

byeblack commented Apr 23, 2025

This is a series of problems. I'm not sure if I should open a separate issue to discuss these issues, so I'll write them here for now: 1. Multi-turn may fail when the user uses a non-English language (you will often get the following results)

<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>send_message_text
```json
{"message":"xxxxxxx"}
```<|tool▁call▁end|><|tool▁calls▁end|>
  1. Dynamic tools cannot switch between multi-turn contexts (when you have dozens of tools, they can't coordinate)
    3. Even with multi-turn set up, simple tasks still cannot be completed (I'm considering writing a simple complex example🤯), The same prompt only takes 2-3 requests to complete in Cherry Studio, but it can't complete the task in rig, I need to spend time to study it

Tested with DeepSeek-V3-0324
Note: For issue 1, that's just a symptom of missing tools. For dynamic tools, the list is currently only sent on the first request. When using static tools, the tool list is always sent.
For issue 3, when I use static tools, it works fine. I found that Cherry Studio uses Prompt + custom parser to implement streaming tool calls (and works well), but does not implement non-streaming tool calls.😂

@byeblack I'm trying to parse thru and re-produce the issues described here. Can you make an explicit new issue with steps so I can reproduce?

Here is a simple example that you can easily reproduce: https://github.com/byeblack/rig-multi-turn-demo

update:
By logging, you will find that all follow-up requests have lost the tool list.
I found that Cherry Studio's follow-up mechanism is to follow up if the tool is called, otherwise it will directly output the content. If rig can have a mechanism to detect the call of the tool, I think I will take this approach.

Off-topic:

  1. Reasoning loops through extractors are also not practical for me, because I need to manually add tool context to let LLM know how to better assign tasks.

  2. If possible, I also hope that there is some means to manually intervene in reasoning loops, for two reasons, one is because we can evaluate whether the process is correct; the other is that we can add more context. This way, we don't have to waste tokens and customer time. Multiple tools in parallel are good, but the wrong direction will only waste resources.

@0xMochan
Copy link
Contributor Author

By logging, you will find that all follow-up requests have lost the tool list.
I found that Cherry Studio's follow-up mechanism is to follow up if the tool is called, otherwise it will directly output the content. If rig can have a mechanism to detect the call of the tool, I think I will take this approach.

Yes, this is a limitation of our dynamic tool set, and honestly, it's a bit of a limitation here. I'll see where I can maybe transfer dynamic tools to the downstream calling but likely, a better approach is introducing a RAG tool rather than the explicit lag layer we have. This is in a large rework of how agents will work via the middleware agent approach (#346).

Reasoning loops through extractors are also not practical for me, because I need to manually add tool context to let LLM know how to better assign tasks.
If possible, I also hope that there is some means to manually intervene in reasoning loops, for two reasons, one is because we can evaluate whether the process is correct; the other is that we can add more context. This way, we don't have to waste tokens and customer time. Multiple tools in parallel are good, but the wrong direction will only waste resources.

The reasoning_loop.rs serves as an example that should be useful to build off of. It's not designed to be used directly, hense why it's not in our repo directly and is an example. I agree! I think a proper reasoning loop needs ways for the user / client to intervene with code like error recovery and such. This is the beginning of reasoning loops and multi-turn in rig and I think it'll only get better from here.

If you are interested in keeping a closer tab on this, please join our discord, I'd love to chat more about how we can evolve this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants