Skip to content

dev -> main (Jul 18th) #1532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 19, 2025
Merged

dev -> main (Jul 18th) #1532

merged 23 commits into from
Jul 19, 2025

Conversation

Ahmad-mtos
Copy link
Contributor

@Ahmad-mtos Ahmad-mtos commented Jul 18, 2025

PR Type

Bug fix, Enhancement, Documentation


Description

  • Fix cartesian product bug in get_task query

  • Enhance LLM client with custom API key support

  • Remove deprecated continue_ field from message models

  • Add automated model documentation sync workflow


Diagram Walkthrough

flowchart LR
  A["Task Query Bug"] --> B["Fixed Cartesian Product"]
  C["LLM Client"] --> D["Custom API Key Support"]
  E["Message Models"] --> F["Remove continue_ Field"]
  G["Model Config"] --> H["Auto-sync Documentation"]
  B --> I["Better Tool Filtering"]
  D --> J["User Secret Integration"]
  F --> K["Cleaner API Schema"]
  H --> L["Updated Model Tables"]
Loading

File Walkthrough

Relevant files

Ahmad-mtos and others added 22 commits July 9, 2025 08:57
Fixed cartesian product bug in get_task query that caused duplicate tools.

Co-authored-by: Ahmad Haidar <[email protected]>
fix(agents-api): refactor get_task query to fix cartesian product bug
- Added custom API key support feature for LLM client
- Added model name conversion utility for LiteLLM integration  
- Re-enabled model validation in session chat rendering

Co-authored-by: Ahmad Haidar <[email protected]>
fix(agents-api): properly handle custom-api-key and secrets api key
…pt for updating model data from configuration
docs: better transparent supported models page
fix(agents-api): remove ``continue_`` field + filter out ``name`` field for non openai models
Comment on lines +14 to +51
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.12'

- name: Install dependencies
run: |
pip install pyyaml

- name: Run sync script
run: |
python src/scripts/sync_model_docs.py

- name: Check for changes
id: check_changes
run: |
if git diff --quiet documentation/integrations/supported-models.mdx; then
echo "changed=false" >> $GITHUB_OUTPUT
else
echo "changed=true" >> $GITHUB_OUTPUT
fi

- name: Commit and push changes
if: steps.check_changes.outputs.changed == 'true'
run: |
git config --global user.name 'github-actions[bot]'
git config --global user.email 'github-actions[bot]@users.noreply.github.com'
git add documentation/integrations/supported-models.mdx
git commit -m "chore: sync model documentation from litellm-config.yaml [skip ci]"
git push

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 8 days ago

To fix the problem, you should add an explicit permissions block to the workflow file to restrict the GITHUB_TOKEN scope to the minimum needed. Since this workflow checks out code and pushes changes (committing to the repository), it requires contents: write permission. The contents: write scope is the least privilege needed for pushing to the repository, while other permissions (such as pull-requests: write) are not required here. The best place to add this is at the top level of the workflow file, so it applies to all jobs (unless overridden).

You should add the following block right after the name: field (before on:):

permissions:
  contents: write

No other code or imports are needed.


Suggested changeset 1
.github/workflows/sync-model-docs.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/sync-model-docs.yml b/.github/workflows/sync-model-docs.yml
--- a/.github/workflows/sync-model-docs.yml
+++ b/.github/workflows/sync-model-docs.yml
@@ -1,3 +1,5 @@
+permissions:
+  contents: write
 name: Sync Model Documentation
 
 on:
EOF
@@ -1,3 +1,5 @@
permissions:
contents: write
name: Sync Model Documentation

on:
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link

claude bot commented Jul 18, 2025

Claude finished @Ahmad-mtos's task —— View job


PR Review Analysis

I'm reviewing this dev -> main PR with 22 commits and 25 changed files. Here's my review plan:

Review Checklist:

  • Security Analysis - Address GitHub Advanced Security workflow permissions concern
  • Workflow Changes - Review new GitHub Actions workflow
  • Core Code Changes - Examine agents-api and integrations-service modifications
  • Documentation Updates - Review documentation changes for accuracy
  • Test Coverage - Analyze new tests and coverage
  • Configuration Changes - Review schema and config updates
  • Performance Considerations - Assess performance implications
  • Best Practices - Check adherence to coding standards

Security Analysis

🔴 Critical Security Issues

  1. GitHub Actions Workflow Permissions (Already flagged by security scanner)

    • File: .github/workflows/sync-model-docs.yml
    • Issue: Workflow lacks explicit permissions block
    • Risk: Defaults to broad permissions which could be exploited
    • Fix: Add explicit permissions block:
    permissions:
      contents: write
      pull-requests: write
  2. External URL Fetch Without Validation

    • File: src/scripts/sync_model_docs.py:359
    • Issue: Script fetches data from external GitHub URL without proper validation
    • Risk: Potential for supply chain attacks if GitHub repository is compromised
    • Fix: Add URL validation, certificate verification, and content validation

🟡 Medium Security Issues

  1. Error Handling Exposes Implementation Details
    • File: src/agents-api/agents_api/clients/litellm.py:128-129
    • Issue: Exception details printed to console may expose sensitive information
    • Fix: Use proper logging with sanitization instead of print statements

Code Quality & Best Practices

✅ Positive Aspects

  1. Comprehensive Test Coverage: Good test additions in test_litellm_utils.py and test_task_queries.py
  2. Proper Error Handling: Usage of contextlib.suppress for graceful secret handling
  3. Type Safety: Consistent use of beartype decorators and type hints
  4. Documentation: Well-documented functions with docstrings

🔴 Issues Found

  1. Hardcoded Path: src/agents-api/agents_api/common/utils/llm_providers.py:5

    • Issue: /app/litellm-config.yaml is hardcoded
    • Impact: Makes code less flexible and harder to test
    • Fix: Use environment variable or configurable path
  2. Missing Error Handling: src/scripts/sync_model_docs.py:361-372

    • Issue: Network errors may not be handled gracefully
    • Impact: Script could fail silently or with unclear error messages
  3. SQL Query Complexity: src/agents-api/agents_api/queries/tasks/get_task.py:12-53

    • Issue: Complex SQL with multiple subqueries
    • Impact: Performance implications and maintenance difficulty
    • Consider: Breaking into smaller, more focused queries

Performance Considerations

🟡 Potential Performance Issues

  1. Blocking Network Calls: src/scripts/sync_model_docs.py:363

    • Issue: Synchronous HTTP request in async context
    • Impact: Could block event loop
    • Fix: Use aiohttp for async requests
  2. Memory Usage: Model data processing loads entire config and pricing data into memory

    • Impact: Could be problematic with large datasets
    • Consider: Streaming or chunked processing for large files

Test Coverage Assessment

✅ Good Test Coverage

  1. LiteLLM Utils: Comprehensive tests for model name resolution and API key handling
  2. Task Queries: Good coverage of database query functionality
  3. Mock Usage: Proper use of mocks for external dependencies

🔴 Missing Test Coverage

  1. Error Scenarios: Limited testing of failure cases (network errors, invalid configs)
  2. Integration Tests: Missing tests for end-to-end workflow functionality
  3. Security Tests: No tests for input validation or security scenarios

Recommendations

High Priority

  1. Fix workflow permissions - Add explicit permissions block to GitHub Actions workflow
  2. Secure external data fetching - Add validation and error handling for GitHub API calls
  3. Remove hardcoded paths - Make configuration paths configurable

Medium Priority

  1. Improve error handling - Replace print statements with proper logging
  2. Add integration tests - Test complete workflows end-to-end
  3. Performance optimization - Consider async patterns for I/O operations

Low Priority

  1. Code documentation - Add more inline comments for complex logic
  2. Refactor complex queries - Break down large SQL queries into smaller parts

Overall Assessment

This PR introduces significant functionality improvements including:

  • Enhanced LiteLLM integration with custom API key support
  • Automated documentation sync from configuration
  • Better model resolution and provider mapping
  • Comprehensive test coverage for new features

However, it also introduces security vulnerabilities that should be addressed before merging to main. The code quality is generally good with proper type safety and error handling patterns, but needs security hardening.

Recommendation: Address security issues before merging. Consider implementing security fixes as a follow-up PR if time is critical.


Copy link
Contributor

github-actions bot commented Jul 18, 2025

Claude finished @Ahmad-mtos's task —— View job


Changelog Bot Analysis ✅

Analyzed the dev → main PR and successfully updated changelog with user-visible changes.

Todo List:

  • Read diff.patch to understand all changes
  • Analyze changes to identify user-visible features/fixes
  • Create/update changelog file in changelog/main/
  • Update Templates/index-template/home-summary.html if needed
  • Stage, commit, and push changes

Changes Made:

  • ✅ Created changelog/stable (main)/2025-07-18.md with proper YAML front-matter
  • ✅ Updated changelog/Templates/index-template/home-summary.html with July 18, 2025 release highlights
  • ✅ Committed and pushed changes to the dev branch

User-Visible Features Added:

  • Enhanced LLM client to support custom API keys from user secrets
  • Added model name conversion utility for LiteLLM integration
  • Added automated workflow to sync model documentation from configuration
  • Enhanced supported models page with transparent feature comparison tables
  • Added comprehensive model capability tracking (context window, max output, tool calling, vision, audio, caching, cost tiers)

Bug Fixes:

  • Fixed cartesian product bug in get_task query that caused duplicate tools
  • Re-enabled model validation in session chat rendering

Commit Details:

  • Hash: 6a449e2d
  • Message: "changelog: add July 18, 2025 release notes"
  • Files: 2 changed, 27 insertions, 9 deletions

Copy link
Contributor

github-actions bot commented Jul 18, 2025

Claude finished @Ahmad-mtos's task —— View job


Analysis Complete

Todo List:

  • Read diff.patch file
  • Analyze changes for user-visible functionality
  • Determine if changes qualify as UPDATE or SKIP
  • Provide final response

Analysis Summary

Based on my analysis of the diff.patch file, I identified several key changes that qualify as user-visible functionality additions and behavior changes:

  1. New JSON Response Format Support: Addition of response_format parameter with support for json_object and json_schema types
  2. Streaming Support: Addition of stream: true parameter for enabling streaming responses
  3. API Field Removal: The continue_ field has been removed from multiple API models (Chat.py, Tasks.py, entries models)
  4. Enhanced Documentation: New documentation sections explaining the new features
  5. Model Support Updates: Updated supported models documentation with feature comparison tables

Since the changes include both new user-visible functionality and changes to existing behavior, the response is:

UPDATE


Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Logic Complexity

The custom API key handling logic in both acompletion and aembedding functions has complex conditional flows that determine model name transformation and base URL selection. The logic should be thoroughly tested to ensure all combinations of custom_api_key, secret retrieval, and model name handling work correctly.

# Check if user has a custom API key in secrets
api_key_env_var_name = get_api_key_env_var_name(model)
secret = None

if api_user is not None and api_key_env_var_name:
    developer_id: UUID = UUID(api_user)

    with contextlib.suppress(SecretNotFoundError):
        secret = await get_secret_by_name(
            developer_id=developer_id,
            name=api_key_env_var_name,
            decrypt=True,
        )

# Determine if we have a custom API key (either provided or from secrets)
has_custom_key = custom_api_key or (secret and secret.value)

if has_custom_key:
    # User has their own API key - convert model name and use direct provider
    if not custom_api_key and secret:
        custom_api_key = secret.value
    model = get_litellm_model_name(model)
else:
    # No custom key - use our proxy with openai/ prefix
    model = f"openai/{model}"
Query Performance

The subquery approach to fix the cartesian product issue uses a timestamp-based filter (updated_at >= t.updated_at) which may have performance implications on large datasets. The query should be tested with realistic data volumes to ensure acceptable performance.

    (
        SELECT jsonb_agg(to_jsonb(tl))
        FROM tools tl
        WHERE tl.developer_id = t.developer_id
        AND tl.task_id = t.task_id
        AND tl.updated_at >= t.updated_at
    ),
    '[]'::jsonb
) as tools
Message Filtering

The message filtering logic removes the 'name' field from messages when using non-OpenAI models, but this filtering is applied broadly without considering if the target model actually supports or requires the name field. This could potentially break functionality for models that do support named messages.

if get_api_key_env_var_name(agent_model) != "OPENAI_API_KEY":
    prompt = [{k: v for k, v in message.items() if k != "name"} for message in prompt]

Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Use epsilon for floating-point zero comparison

Using exact floating-point equality comparison (== 0) can be unreliable due to
floating-point precision issues. Consider using a small epsilon for the zero
comparison to handle potential floating-point rounding errors.

src/scripts/sync_model_docs.py [23-34]

 def classify_cost_tier(input_cost_per_token: float) -> str:
     """Classify model cost tier based on input token cost."""
-    if input_cost_per_token == 0:
+    if input_cost_per_token < 1e-10:  # Effectively zero with epsilon
         return "Free"
     elif input_cost_per_token < 0.0000005:  # $0.50 per 1M tokens
         return "Budget"
     elif input_cost_per_token < 0.000002:   # $2.00 per 1M tokens
         return "Standard"
     elif input_cost_per_token < 0.00001:    # $10.00 per 1M tokens
         return "Premium"
     else:
         return "Enterprise"
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies a potential floating-point precision issue with == 0 and proposes a robust alternative, improving the reliability of the cost classification logic.

Low
  • More

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 29c18f3 in 2 minutes and 35 seconds. Click for details.
  • Reviewed 1839 lines of code in 25 files
  • Skipped 0 files when reviewing.
  • Skipped posting 12 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. .github/workflows/sync-model-docs.yml:50
  • Draft comment:
    Minor: The file lacks a trailing newline. Consider adding one for POSIX compliance.
  • Reason this comment was not posted:
    Confidence changes required: 30% <= threshold 50% None
2. documentation/advanced/chat.mdx:55
  • Draft comment:
    Ensure removal of the 'continue_' field is intentional and propagated throughout the API docs.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
3. src/agents-api/agents_api/activities/task_steps/prompt_step.py:49
  • Draft comment:
    Stripping the 'name' key for non-OpenAI models seems deliberate. Confirm that removing 'name' does not break required identification for other providers.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% This comment appears to be asking the author to verify/confirm something, which violates our rules. It's asking "confirm that removing name doesn't break..." which is exactly the kind of speculative "are you sure?" comment we want to avoid. The code change appears deliberate and targeted at non-OpenAI models specifically. Maybe this is actually pointing out a real compatibility issue with other LLM providers that could cause bugs? If there was a real compatibility issue, the comment should state it directly with evidence. Asking for verification without specific evidence of a problem is not helpful. Delete this comment as it's asking for unnecessary verification without providing evidence of an actual problem.
4. src/agents-api/agents_api/clients/litellm.py:59
  • Draft comment:
    The usage of contextlib.suppress requires an import. Ensure 'import contextlib' is added if not already present.
  • Reason this comment was not posted:
    Comment was on unchanged code.
5. src/agents-api/agents_api/common/utils/llm_providers.py:34
  • Draft comment:
    get_litellm_model_name calls get_config() but there is no import for get_config. Ensure it's imported or defined.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
6. src/agents-api/agents_api/queries/tasks/get_task.py:28
  • Draft comment:
    The SQL subquery filters tools with 'tl.updated_at >= t.updated_at'. Double-check that this condition correctly filters outdated tool entries.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. src/agents-api/agents_api/routers/sessions/auto_tools/render.py:73
  • Draft comment:
    Re-enabling validate_model() may affect legacy sessions. Confirm that model validation errors are handled gracefully.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to confirm that model validation errors are handled gracefully, which falls under asking the author to ensure behavior is intended or tested. This violates the rules.
8. src/agents-api/tests/test_task_queries.py:400
  • Draft comment:
    Using asyncio.sleep(0.1) to separate timestamps can be flaky. Consider mocking time to control updated_at values in tests.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. src/scripts/sync_model_docs.py:230
  • Draft comment:
    update_documentation() relies on simple string searches for '## Available Models'. This approach is fragile if the doc structure changes; consider more robust parsing.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% While the comment raises a valid point about potential fragility, the actual implementation shows careful consideration of edge cases with multiple fallback mechanisms. The code is not as fragile as the comment suggests. More robust parsing (like using a markdown parser) would add complexity without clear benefits given the simple document structure being handled. The comment identifies a real architectural concern about maintainability. A change in document structure could break the parsing. The current implementation already handles various edge cases and has fallbacks. The added complexity of a full markdown parser may not be worth it for this specific use case. The comment should be deleted as it raises concerns about fragility that are already adequately addressed in the implementation through multiple fallback mechanisms.
10. src/scripts/sync_model_docs.py:300
  • Draft comment:
    Project root detection logic in main() may be fragile in non-standard setups. Verify it works in all expected environments.
  • Reason this comment was not posted:
    Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the PR author to verify that the project root detection logic works in all expected environments. This falls under the category of asking the author to ensure the behavior is intended or to double-check things, which is against the rules.
11. documentation/advanced/chat.mdx:258
  • Draft comment:
    The code fence for the Python example includes an extra label ('Python'). Consider using just 'python' unless the extra label is intentional.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% This is a very minor stylistic suggestion. The casing doesn't affect functionality. The document uses 'Python' consistently throughout, suggesting it's an intentional style choice. This kind of nitpicky comment about documentation formatting doesn't add value. Maybe there's a style guide that mandates lowercase language names in code fences that I'm not aware of? Even if there was such a style guide, this kind of minor formatting detail in documentation isn't worth a PR comment. The current casing is consistent and readable. Delete this comment as it's an unimportant stylistic nitpick about documentation formatting that doesn't affect functionality.
12. documentation/advanced/chat.mdx:275
  • Draft comment:
    The code fence for the JavaScript example includes an extra label ('Node.js'). Consider using just 'javascript' unless the extra label is intentional.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The Node.js label appears to be intentional as part of a pattern in the documentation. The Python examples also have additional labels. The labels help distinguish between different runtime environments. Removing it would actually make the documentation less clear. This is also a UI/documentation style choice rather than a code logic issue. The comment might have a point if there was inconsistent labeling across the codebase, but we can only see this file. The labeling appears consistent within this file and serves a clear purpose in distinguishing runtime environments. The comment should be deleted as it suggests removing an intentional and useful documentation label, and it's also about documentation style rather than code logic.

Workflow ID: wflow_zTDrbzD57l35ugxz

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@Ahmad-mtos Ahmad-mtos merged commit d7262c6 into main Jul 19, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants