Tool calling functionality #477

scosman · 2025-08-03T23:55:35Z

Summary by CodeRabbit

New Features
- Enhanced support for iterative tool usage within a single conversation turn, allowing multiple tool calls per turn.
- Improved handling and formatting of tool calls and responses in chat history and model output.
- Added support for math tools in model interactions.
- Introduced a new method to list available tools.
- Added detailed trace information to run outputs for better debugging.
Bug Fixes
- Added validation to prevent conflicts between tool usage and structured output formats.
Tests
- Introduced comprehensive tests for tool integration, chat history, and tool call handling.
- Added end-to-end tests verifying model and tool interaction for math tasks.

… and some thinking.

…all parameter schema

…aving the checkpoint. Prompt: We are mid-refactor on adding tool calling to this class which makes a series of LLM Chat calls. The chat formatter manages generating the user messages in order. We want to add tooling calling. It's partly done. We can handle a single turn of tool calls (inside the main turn) and it works. But we need to extend it to do several more things: - Allow it to make many tool calls in a row (don't force it to tool-choice none after one), and keep iterating on tool calls until all tool calls are complete, before going to the next turn from the chat formatter. - Currenlty the chat fomatter takes care of generating the whole chat history for the next turn. This will no longer work, as some of the chat history will be tool calls/responses it's not aware of. Refactor to make it only generate the next message, not the history. The litellm_adapter class should manage the message history, and at the end of the turn it should have the complete chat saved in a property (we'll use this later, but test it now). - Note that there's a special tool call `task-response`, which is used for ending a top level turn (chat-formatter managed turn). We take the JSON from that and save it as message (working). This needs to be treated differently from the other tool calls, as it always ends the turn. Guidance - make all tool calling handled in a new function the main loop calls. We don't want the main chat loop (while True) getting more complex. - add appriopiate tests, and keep the code testable

coderabbitai · 2025-08-03T23:55:49Z

Walkthrough

This update introduces iterative tool call handling in the LiteLlmAdapter, allowing multiple consecutive tool invocations within a single turn. It adds new and refactored methods to manage chat history, process tool calls, and integrate tools into model requests. Extensive new tests validate these behaviors, including paid-integration tests for math tools. Additionally, a stub method for available tools is added to the base adapter, and the RunOutput dataclass is extended with a trace field.

Changes

Cohort / File(s)	Change Summary
Base Adapter Tool Interface `libs/core/kiln_ai/adapters/model_adapters/base_adapter.py`	Added import for `KilnTool` and introduced a stub `available_tools` method in `BaseAdapter`.
LiteLlmAdapter Iterative Tool Call Handling `libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py`	Refactored to support iterative tool calls per turn, maintaining chat history, adding tool call loop logic, tool integration in completion kwargs, and helper methods for tool management and execution.
Unit Tests for LiteLlmAdapter Tool Integration `libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py`	Added fixtures and comprehensive tests for tool formatting, completion kwargs, chat history, tool call loops, and `_run` method behavior with and without tools.
Paid Integration Tests for Math Tools `libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py`	Introduced new test module with helpers and async tests to verify math tool integration with the adapter and OpenAI models.
RunOutput Extension `libs/core/kiln_ai/adapters/run_output.py`	Added optional `trace` field to `RunOutput` dataclass to capture detailed execution trace information.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant LiteLlmAdapter
    participant Model
    participant Tool

    User->>LiteLlmAdapter: Send input message
    LiteLlmAdapter->>LiteLlmAdapter: Append to chat history
    loop Tool Call Loop (max N times)
        LiteLlmAdapter->>Model: Send chat history + available tools
        Model-->>LiteLlmAdapter: Return tool call(s) or final response
        alt Tool call(s) present
            LiteLlmAdapter->>Tool: Execute tool(s) with arguments
            Tool-->>LiteLlmAdapter: Return tool result(s)
            LiteLlmAdapter->>LiteLlmAdapter: Append tool results to chat history
        else Final response
            LiteLlmAdapter->>LiteLlmAdapter: Extract output and break loop
        end
    end
    LiteLlmAdapter-->>User: Return output and updated chat history

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

In the warren where logic and numbers entwine,
The adapter now loops, making tool calls in line.
With chat history growing and math tools in play,
Our tests leap and bound in a bright, clever way.
Oh, what fun for a rabbit to see—
Tools and models, in harmony! 🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 93b0bd4 and 9ccbc6e.

📒 Files selected for processing (1)

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
GitHub Check: Build Desktop Apps (windows-latest)
GitHub Check: Build Desktop Apps (macos-latest)
GitHub Check: Build Desktop Apps (macos-13)
GitHub Check: Build Desktop Apps (ubuntu-22.04)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

🔭 Outside diff range comments (1)

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (1)
47-59: Use a different provider in test config to avoid OpenRouter initialization

The test config uses model_provider_name="openrouter" which may trigger OpenRouter initialization and API key checks even when mocked.

Consider using a provider that doesn't require API keys for testing:
 @pytest.fixture
 def config():
     return LiteLlmConfig(
         base_url="https://api.test.com",
         run_config_properties=RunConfigProperties(
             model_name="test-model",
-            model_provider_name="openrouter",
+            model_provider_name="openai",
             prompt_id="simple_prompt_builder",
             structured_output_mode="json_schema",
         ),
         default_headers={"X-Test": "test"},
         additional_body_options={"api_key": "test_key"},
     )
This should prevent the OpenRouter initialization errors in the tests.

🧹 Nitpick comments (2)

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py (1)
80-86: Remove duplicate assertion

There's a duplicate assertion checking for "64" in the output.

Remove the duplicate assertion:
         assert "64" in run.output.output
         assert run.id is not None
         assert (
             run.input
             == "You should answer the following question: four plus six times 10"
         )
-        assert "64" in run.output.output
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (1)
32-33: Document the rationale for MAX_CALLS_PER_TURN limit

The constant MAX_CALLS_PER_TURN = 10 prevents infinite loops, but the specific value seems arbitrary.

Add a comment explaining why 10 was chosen:
-MAX_CALLS_PER_TURN = 10
+# Maximum number of tool calls allowed per turn to prevent infinite loops
+# and excessive token usage. Based on typical tool interaction patterns.
+MAX_CALLS_PER_TURN = 10

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 599ba54 and e8a0152.

📒 Files selected for processing (4)

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (2 hunks)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (7 hunks)
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (2 hunks)
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/*.py: Always assume pydantic 2 (not pydantic 1)
The project supports Python 3.10 and above

Files:

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py

**/test_*.py

📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)

**/test_*.py: Always use pytest for tests in Python code
Test brevity is important. Use approaches for re-use and brevity including using fixtures for repeated code, and using pytest parameterize for similar tests

Files:

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py

🧬 Code Graph Analysis (2)

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (1)

libs/core/kiln_ai/tools/base_tool.py (1)

KilnTool (40-82)

libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (6)

libs/core/kiln_ai/datamodel/json_schema.py (1)

validate_schema_with_value_error (50-70)

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (3)

model_provider (79-94)

build_chat_formatter (205-249)

available_tools (323-325)

libs/core/kiln_ai/tools/base_tool.py (1)

KilnTool (40-82)

libs/core/kiln_ai/utils/exhaustive_error.py (1)

raise_exhaustive_enum_error (5-6)

libs/core/kiln_ai/adapters/chat/chat_formatter.py (7)

messages (45-46)

next_turn (56-58)

next_turn (62-79)

next_turn (95-125)

next_turn (141-174)

next_turn (178-195)

intermediate_outputs (51-53)

libs/core/kiln_ai/adapters/run_output.py (1)

RunOutput (8-11)

🪛 GitHub Actions: Debug Detector

libs/core/kiln_ai/adapters/model_adapters/base_adapter.py

[error] 324-324: Developer content found: TODO comment present in code.

libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py

[error] 531-531: Developer content found: TODO comment present in code.

[error] 643-643: Developer content found: TODO comment present in code.

🪛 GitHub Actions: Build and Test

libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py

[error] 728-728: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys

[error] 767-767: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys

[error] 821-821: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys

[error] 908-908: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys

[error] 988-988: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys

[error] 1040-1040: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys

🪛 GitHub Actions: Coverage Report