-
Notifications
You must be signed in to change notification settings - Fork 287
Tool calling functionality #477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: scosman/tools_scaffolding
Are you sure you want to change the base?
Conversation
… and some thinking.
…all parameter schema
…aving the checkpoint. Prompt: We are mid-refactor on adding tool calling to this class which makes a series of LLM Chat calls. The chat formatter manages generating the user messages in order. We want to add tooling calling. It's partly done. We can handle a single turn of tool calls (inside the main turn) and it works. But we need to extend it to do several more things: - Allow it to make many tool calls in a row (don't force it to tool-choice none after one), and keep iterating on tool calls until all tool calls are complete, before going to the next turn from the chat formatter. - Currenlty the chat fomatter takes care of generating the whole chat history for the next turn. This will no longer work, as some of the chat history will be tool calls/responses it's not aware of. Refactor to make it only generate the next message, not the history. The litellm_adapter class should manage the message history, and at the end of the turn it should have the complete chat saved in a property (we'll use this later, but test it now). - Note that there's a special tool call `task-response`, which is used for ending a top level turn (chat-formatter managed turn). We take the JSON from that and save it as message (working). This needs to be treated differently from the other tool calls, as it always ends the turn. Guidance - make all tool calling handled in a new function the main loop calls. We don't want the main chat loop (while True) getting more complex. - add appriopiate tests, and keep the code testable
WalkthroughThis update introduces iterative tool call handling in the LiteLlmAdapter, allowing multiple consecutive tool invocations within a single turn. It adds new and refactored methods to manage chat history, process tool calls, and integrate tools into model requests. Extensive new tests validate these behaviors, including paid-integration tests for math tools. Additionally, a stub method for available tools is added to the base adapter, and the RunOutput dataclass is extended with a trace field. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant LiteLlmAdapter
participant Model
participant Tool
User->>LiteLlmAdapter: Send input message
LiteLlmAdapter->>LiteLlmAdapter: Append to chat history
loop Tool Call Loop (max N times)
LiteLlmAdapter->>Model: Send chat history + available tools
Model-->>LiteLlmAdapter: Return tool call(s) or final response
alt Tool call(s) present
LiteLlmAdapter->>Tool: Execute tool(s) with arguments
Tool-->>LiteLlmAdapter: Return tool result(s)
LiteLlmAdapter->>LiteLlmAdapter: Append tool results to chat history
else Final response
LiteLlmAdapter->>LiteLlmAdapter: Extract output and break loop
end
end
LiteLlmAdapter-->>User: Return output and updated chat history
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🔭 Outside diff range comments (1)
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py (1)
47-59
: Use a different provider in test config to avoid OpenRouter initializationThe test config uses
model_provider_name="openrouter"
which may trigger OpenRouter initialization and API key checks even when mocked.Consider using a provider that doesn't require API keys for testing:
@pytest.fixture def config(): return LiteLlmConfig( base_url="https://api.test.com", run_config_properties=RunConfigProperties( model_name="test-model", - model_provider_name="openrouter", + model_provider_name="openai", prompt_id="simple_prompt_builder", structured_output_mode="json_schema", ), default_headers={"X-Test": "test"}, additional_body_options={"api_key": "test_key"}, )This should prevent the OpenRouter initialization errors in the tests.
🧹 Nitpick comments (2)
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py (1)
80-86
: Remove duplicate assertionThere's a duplicate assertion checking for "64" in the output.
Remove the duplicate assertion:
assert "64" in run.output.output assert run.id is not None assert ( run.input == "You should answer the following question: four plus six times 10" ) - assert "64" in run.output.output
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (1)
32-33
: Document the rationale for MAX_CALLS_PER_TURN limitThe constant
MAX_CALLS_PER_TURN = 10
prevents infinite loops, but the specific value seems arbitrary.Add a comment explaining why 10 was chosen:
-MAX_CALLS_PER_TURN = 10 +# Maximum number of tool calls allowed per turn to prevent infinite loops +# and excessive token usage. Based on typical tool interaction patterns. +MAX_CALLS_PER_TURN = 10
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
(2 hunks)libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
(7 hunks)libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
(2 hunks)libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)
**/*.py
: Always assume pydantic 2 (not pydantic 1)
The project supports Python 3.10 and above
Files:
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
**/test_*.py
📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)
**/test_*.py
: Always use pytest for tests in Python code
Test brevity is important. Use approaches for re-use and brevity including using fixtures for repeated code, and using pytest parameterize for similar tests
Files:
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
🧬 Code Graph Analysis (2)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (1)
libs/core/kiln_ai/tools/base_tool.py (1)
KilnTool
(40-82)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (6)
libs/core/kiln_ai/datamodel/json_schema.py (1)
validate_schema_with_value_error
(50-70)libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (3)
model_provider
(79-94)build_chat_formatter
(205-249)available_tools
(323-325)libs/core/kiln_ai/tools/base_tool.py (1)
KilnTool
(40-82)libs/core/kiln_ai/utils/exhaustive_error.py (1)
raise_exhaustive_enum_error
(5-6)libs/core/kiln_ai/adapters/chat/chat_formatter.py (7)
messages
(45-46)next_turn
(56-58)next_turn
(62-79)next_turn
(95-125)next_turn
(141-174)next_turn
(178-195)intermediate_outputs
(51-53)libs/core/kiln_ai/adapters/run_output.py (1)
RunOutput
(8-11)
🪛 GitHub Actions: Debug Detector
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
[error] 324-324: Developer content found: TODO comment present in code.
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
[error] 531-531: Developer content found: TODO comment present in code.
[error] 643-643: Developer content found: TODO comment present in code.
🪛 GitHub Actions: Build and Test
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
[error] 728-728: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
[error] 767-767: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
[error] 821-821: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
[error] 908-908: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
[error] 988-988: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
[error] 1040-1040: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
🪛 GitHub Actions: Coverage Report
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
[error] 728-728: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
[error] 767-767: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
[error] 821-821: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
[error] 908-908: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
[error] 988-988: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
[error] 1040-1040: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (macos-13)
# If no tool calls, return the content as final output | ||
if content: | ||
return content, messages | ||
|
||
# If we get here with no content and no tool calls, break | ||
break | ||
|
||
raise RuntimeError( | ||
f"Too many tool calls ({tool_calls_count}). Stopping iteration to avoid using too many tokens." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle edge case when model returns neither content nor tool calls
The loop breaks silently when there's no content and no tool calls, then raises a generic error about too many tool calls which is misleading.
Add proper error handling for this edge case:
# If no tool calls, return the content as final output
if content:
return content, messages
# If we get here with no content and no tool calls, break
- break
+ raise RuntimeError(
+ "Model returned neither content nor tool calls. This may indicate an issue with the model response."
+ )
- raise RuntimeError(
- f"Too many tool calls ({tool_calls_count}). Stopping iteration to avoid using too many tokens."
- )
+ # Only reached if we exit the while loop due to tool_calls_count
+ raise RuntimeError(
+ f"Too many tool calls ({tool_calls_count}). Stopping iteration to avoid using too many tokens."
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# If no tool calls, return the content as final output | |
if content: | |
return content, messages | |
# If we get here with no content and no tool calls, break | |
break | |
raise RuntimeError( | |
f"Too many tool calls ({tool_calls_count}). Stopping iteration to avoid using too many tokens." | |
) | |
# If no tool calls, return the content as final output | |
if content: | |
return content, messages | |
# If we get here with no content and no tool calls, break | |
- break | |
+ raise RuntimeError( | |
+ "Model returned neither content nor tool calls. This may indicate an issue with the model response." | |
+ ) | |
- raise RuntimeError( | |
- f"Too many tool calls ({tool_calls_count}). Stopping iteration to avoid using too many tokens." | |
- ) | |
+ # Only reached if we exit the while loop due to tool_calls_count | |
+ raise RuntimeError( | |
+ f"Too many tool calls ({tool_calls_count}). Stopping iteration to avoid using too many tokens." | |
+ ) |
🤖 Prompt for AI Agents
In libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py around lines 129
to 138, the code breaks the loop silently when there is neither content nor tool
calls, then raises a misleading RuntimeError about too many tool calls. Add
explicit error handling for this edge case by raising a clear and specific
exception or returning an appropriate error message indicating that the model
returned neither content nor tool calls, so the issue is accurately reported.
# TODO: maybe reconsider this. Model should be able to choose between a final answer or a tool call on any turn. But good models have json_schea, so do we need to support both? If we do, merge them, and consider auto vs forced when merging (only forced for final, auto for merged). | ||
# Check for a conflict between tools and response format using tools | ||
if has_tools and "tools" in response_format_options: | ||
raise ValueError( | ||
"Function calling/tools can't be used as the JSON response format if you're also using tools. Please select a different structured output mode." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove TODO comment and clarify the design decision
The pipeline flagged this TODO comment. The comment raises a valid design question about supporting both structured output and tools simultaneously.
Either implement the functionality or document why the current approach was chosen:
- # TODO: maybe reconsider this. Model should be able to choose between a final answer or a tool call on any turn. But good models have json_schea, so do we need to support both? If we do, merge them, and consider auto vs forced when merging (only forced for final, auto for merged).
# Check for a conflict between tools and response format using tools
if has_tools and "tools" in response_format_options:
raise ValueError(
"Function calling/tools can't be used as the JSON response format if you're also using tools. Please select a different structured output mode."
)
Consider tracking this design question in an issue if it needs further discussion.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# TODO: maybe reconsider this. Model should be able to choose between a final answer or a tool call on any turn. But good models have json_schea, so do we need to support both? If we do, merge them, and consider auto vs forced when merging (only forced for final, auto for merged). | |
# Check for a conflict between tools and response format using tools | |
if has_tools and "tools" in response_format_options: | |
raise ValueError( | |
"Function calling/tools can't be used as the JSON response format if you're also using tools. Please select a different structured output mode." | |
) | |
# Check for a conflict between tools and response format using tools | |
if has_tools and "tools" in response_format_options: | |
raise ValueError( | |
"Function calling/tools can't be used as the JSON response format if you're also using tools. Please select a different structured output mode." | |
) |
🧰 Tools
🪛 GitHub Actions: Debug Detector
[error] 531-531: Developer content found: TODO comment present in code.
🤖 Prompt for AI Agents
In libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py around lines 531
to 536, remove the TODO comment about reconsidering the model's choice between
final answer or tool call and the question about supporting both JSON schema and
tools. Instead, clarify the design decision by either implementing support for
both or explicitly documenting why the current restriction exists. If the design
question requires further discussion, create and reference an issue for tracking
it.
# TODO: check if this really needs to be a string. If it does, run should be forced to return a string and we should avoid the str() | ||
"content": str(result), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Clarify tool return type requirements
The TODO comment indicates uncertainty about whether tool results must be strings.
Define the return type contract clearly:
"tool_call_id": tool_call.id,
"name": tool_name,
- # TODO: check if this really needs to be a string. If it does, run should be forced to return a string and we should avoid the str()
- "content": str(result),
+ "content": str(result), # Tool results are always stringified for consistency
}
Consider updating the KilnTool.run()
method signature to specify the return type should be str
or Any
that can be stringified, and document this requirement in the base class.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# TODO: check if this really needs to be a string. If it does, run should be forced to return a string and we should avoid the str() | |
"content": str(result), | |
} | |
"tool_call_id": tool_call.id, | |
"name": tool_name, | |
"content": str(result), # Tool results are always stringified for consistency | |
} |
🧰 Tools
🪛 GitHub Actions: Debug Detector
[error] 643-643: Developer content found: TODO comment present in code.
🤖 Prompt for AI Agents
In libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py around lines 643
to 645, clarify the expected return type of the KilnTool.run() method by
updating its signature to specify it should return a string or a type that can
be converted to a string. Add documentation in the base class to explicitly
state this requirement, and remove the need for calling str() on the result in
this code by enforcing the contract at the method level.
async def test_handle_tool_calls_loop_no_tool_calls(config, mock_task): | ||
"""Test _handle_tool_calls_loop with no tool calls in response""" | ||
adapter = LiteLlmAdapter(config=config, kiln_task=mock_task) | ||
initial_messages = [{"role": "user", "content": "Hello"}] | ||
|
||
# Mock response with no tool calls | ||
mock_response = Mock() | ||
mock_choice = Mock() | ||
mock_message = Mock() | ||
mock_message.content = "Hello response" | ||
mock_message.tool_calls = None | ||
mock_choice.message = mock_message | ||
mock_response.choices = [mock_choice] | ||
|
||
with patch.object( | ||
adapter, | ||
"acompletion_checking_response", | ||
return_value=(mock_response, mock_choice), | ||
): | ||
content, messages = await adapter._handle_tool_calls_loop( | ||
Mock(), initial_messages, None | ||
) | ||
|
||
assert content == "Hello response" | ||
assert len(messages) == 2 | ||
assert messages[0] == {"role": "user", "content": "Hello"} | ||
assert messages[1] == { | ||
"role": "assistant", | ||
"content": "Hello response", | ||
"tool_calls": None, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Fix OpenRouter API key errors in tests
Multiple tests are failing with "ValueError: Attempted to use OpenRouter without an API key set". Even though acompletion_checking_response
is mocked, the tests are still trying to initialize an OpenRouter connection.
The issue might be in the test setup or the way the adapter is initialized. Consider:
- Mock the litellm module earlier in the test setup
- Ensure the config doesn't trigger OpenRouter initialization
- Mock any litellm initialization that happens during adapter creation
🏁 Script executed:
#!/bin/bash
# Check how OpenRouter is being initialized in the codebase
rg -A 5 "OpenRouter|openrouter" --type py
Length of output: 82194
Mock Litellm Client in tests to avoid OpenRouter API-key errors
Apply one of the following fixes in libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
:
• Add an autouse fixture to stub out Litellm’s client before any adapter instantiates:
+ import pytest
+ import litellm
+
+ @pytest.fixture(autouse=True)
+ def patch_litellm_client(monkeypatch):
+ """Prevent real OpenRouter/OpenAI clients from initializing in unit tests."""
+ class DummyClient:
+ def __init__(*args, **kwargs): pass
+ monkeypatch.setattr(
+ "kiln_ai.adapters.model_adapters.litellm_adapter.litellm.Client",
+ DummyClient,
+ )
• Or inject a dummy API key into your existing config
fixture so that OpenRouter never complains:
@pytest.fixture
def config():
run_props = RunConfigProperties(
model_name="test-model",
model_provider_name=ModelProviderName.openrouter,
prompt_id="simple_prompt_builder",
structured_output_mode="json_schema",
)
- return LiteLlmConfig(run_config_properties=run_props)
+ return LiteLlmConfig(
+ run_config_properties=run_props,
+ default_headers={},
+ additional_body_options={"api_key": "test-openrouter-key"},
+ )
Either approach decouples your adapter’s initialization from real API keys and stops the
ValueError: Attempted to use OpenRouter without an API key set.
failures in async tests.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 GitHub Actions: Build and Test
[error] 728-728: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
🪛 GitHub Actions: Coverage Report
[error] 728-728: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
🤖 Prompt for AI Agents
In libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py around
lines 709 to 740, the test fails due to missing OpenRouter API key when
LiteLlmAdapter initializes. Fix this by either adding an autouse pytest fixture
that mocks or stubs the Litellm client before any adapter instantiation, or
modify the existing config fixture to inject a dummy API key. This prevents the
adapter from attempting real API calls and avoids the ValueError about missing
API keys during async tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (3)
132-141
: Handle edge case when model returns neither content nor tool callsThe loop breaks silently when there's no content and no tool calls, then raises a generic error about too many tool calls which is misleading.
534-539
: Remove TODO comment and clarify the design decisionThe pipeline flagged this TODO comment. The comment raises a valid design question about supporting both structured output and tools simultaneously.
646-648
: Clarify tool return type requirementsThe TODO comment indicates uncertainty about whether tool results must be strings.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
(7 hunks)libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter_paid.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)
**/*.py
: Always assume pydantic 2 (not pydantic 1)
The project supports Python 3.10 and above
Files:
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
🧠 Learnings (1)
📚 Learning: the `glm_z1_rumination_32b_0414` model was intentionally removed from the built_in_models list due t...
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:0-0
Timestamp: 2025-07-16T09:37:39.816Z
Learning: The `glm_z1_rumination_32b_0414` model was intentionally removed from the built_in_models list due to output formatting issues: output was duplicated in both `output` and `reasoning` fields, and contained random internal JSON in the output. This model should not be re-added without addressing these formatting problems.
Applied to files:
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
🪛 GitHub Actions: Debug Detector
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
[error] 109-109: Developer TODO comment found: 'TODO: ensure structure'
[error] 534-534: Developer TODO comment found: 'TODO: maybe reconsider this. Model should be able to choose between a final answer or a tool call on any turn. But good models have json_schea, so do we need to support both? If we do, merge them, and consider auto vs forced when merging (only forced for final, auto for merged).'
[error] 646-646: Developer TODO comment found: 'TODO: check if this really needs to be a string. If it does, run should be forced to return a string and we should avoid the str()'
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Desktop Apps (windows-latest)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (macos-13)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (macos-latest)
🔇 Additional comments (5)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (5)
1-34
: Constants and imports look goodThe new imports and constants are appropriate for implementing tool calling functionality. The limits for MAX_CALLS_PER_TURN and MAX_TOOL_CALLS_PER_TURN provide good safeguards against runaway iterations.
50-67
: Well-designed state management for tools and chat historyGood use of caching for available tools and proper encapsulation of chat history with a read-only property.
143-273
: Well-structured implementation of tool calling in the run methodThe integration of tool calling with proper chat history management is well done. The handling of different scenarios (tool calls vs regular completion) is clear, and the limitation regarding logprobs from tool calls is properly documented.
275-288
: Good helper method for response validationExcellent addition of a helper method that ensures type safety and provides clear error messages.
592-656
: Excellent implementation of tool processing with proper validationThe tool processing logic is well-structured with:
- Proper argument parsing and JSON schema validation
- Clear error messages for debugging
- Good separation between task_response and normal tools
- Appropriate error handling for edge cases
The validation against tool schemas before execution is particularly good for catching errors early.
# TODO: ensure structure | ||
"tool_calls": tool_calls, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove TODO comment and ensure message structure
The TODO comment should be addressed. Either implement validation for the assistant message structure or document why it's not needed.
Apply this diff to ensure proper structure:
# Add the assistant's response to messages
assistant_message = {
"role": "assistant",
"content": content,
- # TODO: ensure structure
- "tool_calls": tool_calls,
+ "tool_calls": tool_calls if tool_calls else [],
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# TODO: ensure structure | |
"tool_calls": tool_calls, | |
} | |
# Add the assistant's response to messages | |
assistant_message = { | |
"role": "assistant", | |
"content": content, | |
"tool_calls": tool_calls if tool_calls else [], | |
} |
🧰 Tools
🪛 GitHub Actions: Debug Detector
[error] 109-109: Developer TODO comment found: 'TODO: ensure structure'
🤖 Prompt for AI Agents
In libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py around lines 109
to 111, remove the TODO comment about ensuring the structure of the assistant
message. Instead, implement validation logic to confirm the message structure
meets expected requirements before assigning tool_calls, or add a clear comment
explaining why such validation is unnecessary. This will resolve the TODO and
clarify the code's intent.
… More TODOs for unhandled cases. Now running on all paid test cases. A bunch of the new tests don't work yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (4)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (4)
103-109
: Remove TODO and ensure proper message structureThe TODO comment should be addressed. The assistant message structure should be validated or documented.
Apply this diff to ensure proper structure:
# Add the assistant's response to messages assistant_message = { "role": "assistant", "content": content, - # TODO: ensure structure - "tool_calls": tool_calls, + "tool_calls": tool_calls if tool_calls else [], }
129-138
: Handle edge case when model returns neither content nor tool callsThe loop breaks silently when there's no content and no tool calls, potentially leading to a misleading error message.
504-509
: Remove TODO and document design decisionThe TODO comment raises a valid design question about supporting both structured output and tools.
616-618
: Clarify tool return type requirementsThe TODO indicates uncertainty about tool result type requirements.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
(6 hunks)libs/core/kiln_ai/adapters/run_output.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit Inference Engine (.cursor/rules/project.mdc)
**/*.py
: Always assume pydantic 2 (not pydantic 1)
The project supports Python 3.10 and above
Files:
libs/core/kiln_ai/adapters/run_output.py
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
🧠 Learnings (1)
📚 Learning: the `glm_z1_rumination_32b_0414` model was intentionally removed from the built_in_models list due t...
Learnt from: leonardmq
PR: Kiln-AI/Kiln#418
File: libs/core/kiln_ai/adapters/ml_model_list.py:0-0
Timestamp: 2025-07-16T09:37:39.816Z
Learning: The `glm_z1_rumination_32b_0414` model was intentionally removed from the built_in_models list due to output formatting issues: output was duplicated in both `output` and `reasoning` fields, and contained random internal JSON in the output. This model should not be re-added without addressing these formatting problems.
Applied to files:
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
🪛 GitHub Actions: Debug Detector
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
[error] 106-106: Developer TODO comment found: 'TODO: ensure structure'
[error] 180-180: Developer TODO comment found: 'TODO check format'
[error] 189-189: Developer TODO comment found: 'TODO'
[error] 236-236: Developer TODO comment found: 'TODO: usage is likely incorrect. We're only getting the usage from the final response, not each call'
[error] 504-504: Developer TODO comment found: 'TODO: maybe reconsider this. Model should be able to choose between a final answer or a tool call on any turn. But good models have json_schea, so do we need to support both? If we do, merge them, and consider auto vs forced when merging (only forced for final, auto for merged).'
[error] 616-616: Developer TODO comment found: 'TODO: check if this really needs to be a string. If it does, run should be forced to return a string and we should avoid the str()'
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Desktop Apps (ubuntu-22.04)
- GitHub Check: Build Desktop Apps (ubuntu-22.04-arm)
- GitHub Check: Build Desktop Apps (macos-13)
- GitHub Check: Build Desktop Apps (macos-latest)
- GitHub Check: Build Desktop Apps (windows-latest)
🔇 Additional comments (4)
libs/core/kiln_ai/adapters/run_output.py (1)
1-12
: LGTM!The addition of the
trace
field to store conversation history is well-implemented. The type annotation correctly allows for a list of dictionaries with flexible value types, which is appropriate for capturing various message formats and tool call results.libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (3)
1-34
: LGTM!The new imports and constants are well-chosen for implementing tool calling functionality. The iteration limits (10 turns, 30 tool calls) provide reasonable safeguards against runaway loops.
245-258
: Well-implemented response validationGood defensive programming with comprehensive validation of the response structure before accessing nested fields.
562-626
: Robust tool call processing implementationThe implementation includes excellent validation:
- Tool availability checking
- JSON parsing with error handling
- Schema validation against tool definitions
- Clear error messages for debugging
The separation of concerns between task_response and regular tools is well-handled.
# Add the assistant message to chat history | ||
# TODO check format | ||
messages.append( | ||
{ | ||
"role": "assistant", | ||
"content": prior_output, | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove TODO and validate message format
The TODO comment indicates uncertainty about the message format. This should be validated or documented.
Apply this diff to add validation:
# Add the assistant message to chat history
- # TODO check format
messages.append(
{
"role": "assistant",
"content": prior_output,
}
)
Consider adding a helper method to validate message structure if needed.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Add the assistant message to chat history | |
# TODO check format | |
messages.append( | |
{ | |
"role": "assistant", | |
"content": prior_output, | |
} | |
) | |
# Add the assistant message to chat history | |
messages.append( | |
{ | |
"role": "assistant", | |
"content": prior_output, | |
} | |
) |
🧰 Tools
🪛 GitHub Actions: Debug Detector
[error] 180-180: Developer TODO comment found: 'TODO check format'
🤖 Prompt for AI Agents
In libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py around lines 179
to 186, remove the TODO comment about checking the message format and add
validation to ensure the message dictionary has the correct structure with
required keys "role" and "content" as strings. Implement or call a helper method
to validate the message format before appending it to the messages list, and
document the expected message structure clearly.
# Get the final response from the chat formatter | ||
# TODO | ||
intermediate_outputs = chat_formatter.intermediate_outputs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove incomplete TODO comment
Line 189 contains an incomplete TODO comment that provides no context.
Apply this diff:
# Get the final response from the chat formatter
- # TODO
intermediate_outputs = chat_formatter.intermediate_outputs()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Get the final response from the chat formatter | |
# TODO | |
intermediate_outputs = chat_formatter.intermediate_outputs() | |
# Get the final response from the chat formatter | |
intermediate_outputs = chat_formatter.intermediate_outputs() |
🧰 Tools
🪛 GitHub Actions: Debug Detector
[error] 189-189: Developer TODO comment found: 'TODO'
🤖 Prompt for AI Agents
In libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py at lines 188 to
190, remove the incomplete TODO comment on line 189 as it provides no useful
context or information.
# TODO: usage is likely incorrect. We're only getting the usage from the final response, not each call | ||
usage = ( | ||
self.usage_from_response(last_response) | ||
if last_response is not None | ||
else None | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address incorrect usage calculation
The TODO correctly identifies that usage is calculated only from the final response, which will underreport actual token usage across multiple tool calls.
This is a significant issue that could lead to incorrect cost tracking and rate limiting. Consider accumulating usage across all model calls or documenting this limitation clearly.
Consider implementing a usage accumulator that tracks tokens across all calls within the conversation, similar to how messages are accumulated.
🧰 Tools
🪛 GitHub Actions: Debug Detector
[error] 236-236: Developer TODO comment found: 'TODO: usage is likely incorrect. We're only getting the usage from the final response, not each call'
🤖 Prompt for AI Agents
In libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py around lines 236
to 242, the usage calculation only accounts for the final response, missing
token usage from earlier calls. To fix this, implement an accumulator that sums
usage from all individual model call responses throughout the conversation,
similar to how messages are aggregated. Replace the single usage_from_response
call with logic that iterates over all responses, accumulates their usage data,
and returns the total usage to ensure accurate cost tracking.
Summary by CodeRabbit
New Features
Bug Fixes
Tests