Skip to content

Conversation

blefo
Copy link
Member

@blefo blefo commented Oct 10, 2025

Overview

Implements the OpenAI-compatible /v1/responses API, a more flexible alternative to Chat Completions with structured input, tool calling, web search, and streaming support.

Key Features

  • New /v1/responses endpoint

    • OpenAI-compatible; supports streaming/non-streaming
    • Integrated auth, rate limits, token tracking, and response signing
  • Advanced Tool Handling

    • Auto-detects and executes tool calls
    • Multi-turn workflows with context preservation
    • Supports Python execution and concurrent tools
  • Web Search Integration

    • Optional Brave Search enrichment via web_search param
    • Adds real-time context with source attribution
  • Multimodal Support

    • Image input validation for compatible models

Architecture

  • Split private.py into modular endpoints:

    • /v1/chat/completions → chat.py
    • /v1/responses → responses.py
  • New responses_tool_router.py for tool workflows

  • Modular API models in nilai-common

Technical Highlights

  • Validates model capabilities (tools, multimodal, web search)
  • Supports NilDB prompt retrieval and signed responses
  • SSE streaming with token usage and attribution

blefo added 30 commits October 7, 2025 10:35
…ructure

- Updated OpenAI dependency to version 1.99.2 in both `pyproject.toml` files for `nilai-api` and `nilai-common`.
- Enhanced response model in `responses_model.py` by adding new fields and improving type definitions.
- Refactored response handling in `responses.py` to include usage tracking for input and output tokens.
- Adjusted import statements in `__init__.py` to streamline model access.
- Updated return types in `route_and_execute_tool_call` and `process_tool_calls` to use `FunctionCallOutput`.
- Improved error handling and logging in tool execution.
- Adjusted input handling in `handle_responses_tool_workflow` to support lists of `ResponseInputParam`.
- Added new imports for `FunctionCallOutput` and related types in `nilai_common` models.
- Changed `ResponseFunctionToolCall` to `ResponseFunctionToolCallParam` in multiple functions for better type consistency.
- Enhanced `handle_responses_tool_workflow` to utilize new input item types and improved handling of tool call results.
- Updated imports in `__init__.py` and other files to reflect new model structures.
…e tests architecture

- Introduced new test files for HTTP and OpenAI client interactions with the nilAI API.
- Implemented tests for various scenarios including health checks, model retrieval, chat completions, and response generation.
- Enhanced test coverage for rate limiting and code execution features.
- Removed outdated test file for code execution, consolidating tests into more relevant suites.
- Changed EC2 instance type from g4dn.xlarge to g6.xlarge in the CI workflow.
- Updated the docker-compose command to use the new GPT-20B configuration file.
- Added a new docker-compose file for the GPT-20B GPU service, including environment settings and health checks.
- Updated the CI model reference in the test configuration to use the new GPT-20B model.
- Added a dummy API key for BRAVEE2B in the CI environment setup.
- Updated the EC2 image ID to a new version in the CI configuration.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant