wave10 #15

smarunich · 2025-07-22T17:09:54Z

This pull request introduces significant updates to enhance OpenAI API compatibility, improve documentation, and refine the platform's architecture and usability. Key changes include support for OpenAI-compatible models, updates to rate-limiting mechanisms, and improved documentation for deployment and testing workflows.

OpenAI API Compatibility Enhancements:

Added support for OpenAI-compatible endpoints (/v1/chat/completions, /v1/completions, /v1/embeddings) with automatic protocol translation to KServe format. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R174), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R750-R808))
Introduced model-aware routing using the x-ai-eg-model header for efficient model selection. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R174), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L196-R203))
Expanded supported frameworks to include vLLM and TGI for OpenAI-compatible models. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L158-R200), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R135))

Documentation Improvements:

Updated README.md and CLAUDE.md with detailed guides on OpenAI-compatible model deployment, testing, and rate-limiting configurations. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R75-R80), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L790-R839))
Added new sections on model testing and validation, including interactive inference testing directly from the UI. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R278-R284), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R216-R220))
Refined documentation structure with links to key guides such as architecture, usage, and model publishing. ([README.mdL810-L830](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L810-L830))

Architecture and Configuration Updates:

Enhanced Envoy AI Gateway with EnvoyExtensionPolicy for external AI-specific processing and OpenAI API compatibility. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R174), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R750-R808))
Updated TLS certificate configuration to include wildcard domains for improved flexibility. ([configs/certs/tls-certificates.yamlR21](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-10aa0bb2cbc65e4dcdf79b5596a0ebace978984c87ddd196ce3bee23d4e033b3R21))
Improved observability stack documentation with clearer service access instructions. ([CLAUDE.mdL123-R150](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L123-R150))

Usability Enhancements:

Added token-based rate limiting for LLM models alongside request-based limits. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R174), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R750-R808))
Introduced new scripts and UI components for interactive model testing and quick development workflows. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R216-R220), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R278-R284))

These updates collectively make the platform more robust, user-friendly, and compatible with modern AI/ML deployment needs.

wave10

…re getting overwritten at run time causing issues with tests... Also added --resolve type functionality and discovery of gateway at inference testing - Add DNS resolution support to test execution service - Update TestExecutionRequest type to include ConnectionSettings - Allow DeveloperConsole to pass DNS overrides to predict API - Add createHTTPClient method with custom DNS resolution to test_execution.go - Handle Host header specially in HTTP requests

Extended backend and frontend to discover and display additional Istio resources (DestinationRules, ServiceEntries, AuthorizationPolicies, PeerAuthentications) and KServe resources (InferenceServices, ServingRuntimes, ClusterServingRuntimes). Updated types and API responses, enhanced AdminResources UI with resource graph, filtering, and details panel for comprehensive platform overview.

Renamed 'Application Logs' to 'Model Service Logs' and updated its description for clarity. Changed 'System Console' to 'Platform Console' and revised its description to better reflect platform operations.

Introduces a new ModelsUsage component for monitoring model performance, usage analytics, and cost tracking. Integrates the dashboard into AdminDashboard with a new tab and icon, providing summary cards, model performance table, usage trends, and alerts.

Adjusted card and panel sizing in AdminResources and ResourceGraph components for better fit and appearance. Updated index.css to remove max-width constraints, added wide screen optimizations, and improved grid layouts for large displays.

smarunich · 2025-07-23T01:55:15Z

This pull request introduces significant updates to the documentation and functionality of the Inference-in-a-Box project, focusing on enhancing OpenAI API compatibility, improving usability, and expanding the feature set. Key changes include the addition of OpenAI-compatible endpoints, updates to the documentation for better navigation and clarity, and enhancements to the management service for model testing and configuration.

Documentation Enhancements:

Added navigation links to key sections in CLAUDE.md and README.md for improved usability and quick access to project resources. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R3-R8), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R3-R4))
Expanded GOALS.md with detailed project goals, architecture, and learning outcomes to provide a comprehensive overview of the project's vision and capabilities. ([GOALS.mdR1-R181](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-5bbc2983b6fd4fd4976388493f607af65ac79d1eb2c927b3bbcf7e860dcd9e97R1-R181))

OpenAI API Compatibility:

Updated Envoy AI Gateway to support OpenAI API-compatible endpoints for chat completions, embeddings, and protocol translation from OpenAI to KServe format. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R178), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L117-R119), [[3]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L196-R205))
Added examples for publishing OpenAI-compatible models with token-based rate limiting and testing them via curl commands. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R79-R84), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R114-R126))

Management Service Improvements:

Introduced interactive model testing capabilities in the management service, including support for OpenAI-style testing and real-time response visualization. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R220-R224), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R280-R286))
Enhanced the rate-limiting configuration to include token-based limits for LLM models. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R178), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R268-R270))

Expanded Feature Set:

Added support for new model frameworks (vLLM, TGI) and OpenAI-compatible endpoints in the serverless model serving stack. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L158-R204), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R137))
Included new configuration files and scripts for debugging, testing, and quick development restarts. ([CLAUDE.mdR220-R224](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R220-R224))

Updated UI components and styles to use Tetrate's brand color palette and Poppins font for a more consistent and professional look. This includes changes to gradients, backgrounds, button styles, and tier colors in ResourceGraph, as well as introduction of CSS variables for easier theme management.

Changed the login page title and subtitle to 'Inference-in-a-Box' and updated styling for a more branded look. Modified admin access box colors and enhanced CSS for header and login card with Tetrate Orange accents and improved background gradients.

Enhanced resource filtering in ResourceGraph to respect selected tiers, namespaces, and health status. Added spin animation for refresh icon in FilterPanel. Improved error handling and feedback in AdminResources and InferenceTest components. Updated node ID generation in AdminResources for better uniqueness. Added address format validation and DNS override documentation in test_execution.go.

Copilot

Pull Request Overview

This pull request focuses on enhancing the UI/UX design system with comprehensive Tetrate branding and significantly expanding the platform management interface. The changes introduce a cohesive brand identity using Tetrate's color palette, typography (Poppins font), and design patterns while adding advanced resource management and visualization capabilities.

Key changes include:

Complete UI rebrand with Tetrate color palette and Poppins typography
Major expansion of the platform management interface with advanced topology visualization
Enhanced model testing with DNS resolution support and improved connection settings

Reviewed Changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
management/ui/src/index.css	Comprehensive rebrand with Tetrate color system and typography
management/ui/src/components/ResourceGraph.js	New advanced topology visualization with tier-based layout
management/ui/src/components/ModelsUsage.js	New usage analytics dashboard with performance metrics
management/ui/src/components/AdminResources.js	Major platform navigator overhaul with enhanced resource management
management/ui/src/components/DeveloperConsole.js	Enhanced developer tools with advanced connection settings
management/ui/src/components/InferenceTest.js	Improved model testing with DNS resolution support
management/ui/src/components/FilterPanel.js	New filtering system for resource management
management/ui/src/components/DetailsPanel.js	New detailed resource inspection interface
management/ui/src/components/ErrorBoundary.js	New error handling component with detailed debugging
management/ui/package.json	Added ReactFlow dependency for topology visualization

Comments suppressed due to low confidence (2)

management/ui/src/components/PublishingForm.js:142

Inconsistent naming convention. The field name changed from 'tenantID' to 'tenantId'. Ensure this change is intentional and that all consuming services expect 'tenantId' instead of 'tenantID'.

        requestBody.config.tenantId = formData.tenantId;

management/ui/src/components/ResourceGraph.js:25

[nitpick] CSS keyframes are being injected dynamically into the document head. Consider moving this to a CSS file or using CSS-in-JS libraries for better maintainability and performance.

  Filter,

Copilot · 2025-07-23T02:31:53Z

scripts/bootstrap.sh

@@ -431,7 +431,8 @@ install_envoy_gateway() {
        --version ${ENVOY_GATEWAY_VERSION} \
        --namespace envoy-gateway-system \
        --create-namespace
-
+    kubectl label namespace envoy-gateway-system --overwrite=true istio-injection=enabled


Adding Istio injection to the envoy-gateway-system namespace could cause conflicts or unexpected behavior. Consider documenting why this is necessary or making it conditional based on the deployment configuration.

Suggested change

kubectl label namespace envoy-gateway-system --overwrite=true istio-injection=enabled

# Conditionally enable Istio injection for the envoy-gateway-system namespace

if [ "${ENABLE_ISTIO_INJECTION}" = "true" ]; then

log "Enabling Istio injection for the envoy-gateway-system namespace..."

kubectl label namespace envoy-gateway-system --overwrite=true istio-injection=enabled

else

log "Skipping Istio injection for the envoy-gateway-system namespace."

fi

Copilot · 2025-07-23T02:31:53Z

scripts/bootstrap.sh

@@ -476,7 +477,7 @@ install_envoy_ai_gateway() {
    kubectl apply -f ${PROJECT_DIR}/configs/envoy-gateway/gateway/ai-gateway.yaml

    # Apply Backend resources
-    kubectl apply -f ${PROJECT_DIR}/configs/envoy-gateway/backends/backends.yaml
+    #kubectl apply -f ${PROJECT_DIR}/configs/envoy-gateway/backends/backends.yaml


Configuration file is commented out without explanation. If this is intentional for this release, consider adding a comment explaining why, or remove the line entirely to avoid confusion.

Copilot · 2025-07-23T02:31:53Z

management/ui/src/components/FilterPanel.js

+// Add the spin animation keyframes
+const spinKeyframes = `
+@keyframes spin {
+  from { transform: rotate(0deg); }
+  to { transform: rotate(360deg); }
+}
+`;
+
+// Inject the CSS into the document head if not already present
+if (typeof document !== 'undefined' && !document.querySelector('#spin-animation')) {
+  const style = document.createElement('style');
+  style.id = 'spin-animation';
+  style.textContent = spinKeyframes;
+  document.head.appendChild(style);
+}
+


Duplicate CSS injection logic appears in multiple files. Consider creating a shared utility function or moving animations to a CSS file to avoid code duplication.

Suggested change

// Add the spin animation keyframes

const spinKeyframes = `

@keyframes spin {

from { transform: rotate(0deg); }

to { transform: rotate(360deg); }

}

`;

// Inject the CSS into the document head if not already present

if (typeof document !== 'undefined' && !document.querySelector('#spin-animation')) {

const style = document.createElement('style');

style.id = 'spin-animation';

style.textContent = spinKeyframes;

document.head.appendChild(style);

}

// Removed inline CSS injection logic. The `spin` animation is now defined in an external CSS file.

smarunich and others added 4 commits July 17, 2025 18:10

Merge pull request #14 from smarunich/main

e91cc79

wave10

docs updated

b806bd0

updates to publishing

af342ac

smarunich requested a review from Copilot July 22, 2025 17:10

This comment was marked as outdated.

Sign in to view

linting

d536790

smarunich requested a review from Copilot July 22, 2025 17:21

This comment was marked as outdated.

Sign in to view

smarunich and others added 9 commits July 22, 2025 17:28

re-wire navigation

3504151

further improvements

459cf08

publishing updates

81b64cd

sidecar for gateway

e222d9b

Update bootstrap.sh

c08f86b

Update tab labels and descriptions in AdminDashboard

de4f783

Renamed 'Application Logs' to 'Model Service Logs' and updated its description for clarity. Changed 'System Console' to 'Platform Console' and revised its description to better reflect platform operations.

Improve layout and responsiveness for admin UI

df32947

Adjusted card and panel sizing in AdminResources and ResourceGraph components for better fit and appearance. Updated index.css to remove max-width constraints, added wide screen optimizations, and improved grid layouts for large displays.

smarunich requested a review from Copilot July 23, 2025 01:54

This comment was marked as outdated.

Sign in to view

smarunich added 4 commits July 22, 2025 22:04

Update App.js

03fad65

Update login UI branding and styles

1bcd586

Changed the login page title and subtitle to 'Inference-in-a-Box' and updated styling for a more branded look. Modified admin access box colors and enhanced CSS for header and login card with Tetrate Orange accents and improved background gradients.

Update PublishingForm.js

f7f2a64

smarunich requested a review from Copilot July 23, 2025 02:22

This comment was marked as outdated.

Sign in to view

smarunich requested a review from Copilot July 23, 2025 02:30

Copilot AI reviewed Jul 23, 2025

View reviewed changes

smarunich merged commit 1106ade into main Jul 23, 2025
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wave10 #15

wave10 #15

Uh oh!

smarunich commented Jul 22, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

smarunich commented Jul 23, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 23, 2025

Uh oh!

Copilot AI Jul 23, 2025

Uh oh!

Copilot AI Jul 23, 2025

Uh oh!

Uh oh!

Uh oh!

-    kubectl label namespace envoy-gateway-system --overwrite=true istio-injection=enabled
+    # Conditionally enable Istio injection for the envoy-gateway-system namespace
+    if [ "${ENABLE_ISTIO_INJECTION}" = "true" ]; then
+        log "Enabling Istio injection for the envoy-gateway-system namespace..."
+        kubectl label namespace envoy-gateway-system --overwrite=true istio-injection=enabled
+    else
+        log "Skipping Istio injection for the envoy-gateway-system namespace."
+    fi

wave10 #15

wave10 #15

Uh oh!

Conversation

smarunich commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OpenAI API Compatibility Enhancements:

Documentation Improvements:

Architecture and Configuration Updates:

Usability Enhancements:

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

smarunich commented Jul 23, 2025

Documentation Enhancements:

OpenAI API Compatibility:

Management Service Improvements:

Expanded Feature Set:

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

smarunich commented Jul 22, 2025 •

edited

Loading