Skip to content

wave10 #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 23, 2025
Merged

wave10 #15

merged 19 commits into from
Jul 23, 2025

Conversation

smarunich
Copy link
Owner

@smarunich smarunich commented Jul 22, 2025

This pull request introduces significant updates to enhance OpenAI API compatibility, improve documentation, and refine the platform's architecture and usability. Key changes include support for OpenAI-compatible models, updates to rate-limiting mechanisms, and improved documentation for deployment and testing workflows.

OpenAI API Compatibility Enhancements:

  • Added support for OpenAI-compatible endpoints (/v1/chat/completions, /v1/completions, /v1/embeddings) with automatic protocol translation to KServe format. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R174), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R750-R808))
  • Introduced model-aware routing using the x-ai-eg-model header for efficient model selection. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R174), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L196-R203))
  • Expanded supported frameworks to include vLLM and TGI for OpenAI-compatible models. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L158-R200), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R135))

Documentation Improvements:

  • Updated README.md and CLAUDE.md with detailed guides on OpenAI-compatible model deployment, testing, and rate-limiting configurations. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R75-R80), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L790-R839))
  • Added new sections on model testing and validation, including interactive inference testing directly from the UI. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R278-R284), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R216-R220))
  • Refined documentation structure with links to key guides such as architecture, usage, and model publishing. ([README.mdL810-L830](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L810-L830))

Architecture and Configuration Updates:

  • Enhanced Envoy AI Gateway with EnvoyExtensionPolicy for external AI-specific processing and OpenAI API compatibility. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R174), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R750-R808))
  • Updated TLS certificate configuration to include wildcard domains for improved flexibility. ([configs/certs/tls-certificates.yamlR21](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-10aa0bb2cbc65e4dcdf79b5596a0ebace978984c87ddd196ce3bee23d4e033b3R21))
  • Improved observability stack documentation with clearer service access instructions. ([CLAUDE.mdL123-R150](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L123-R150))

Usability Enhancements:

  • Added token-based rate limiting for LLM models alongside request-based limits. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R174), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R750-R808))
  • Introduced new scripts and UI components for interactive model testing and quick development workflows. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R216-R220), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R278-R284))

These updates collectively make the platform more robust, user-friendly, and compatible with modern AI/ML deployment needs.

smarunich and others added 4 commits July 17, 2025 18:10
…re getting overwritten at run time causing issues with tests...

Also added --resolve type functionality  and discovery of gateway at inference testing

  - Add DNS resolution support to test execution service
  - Update TestExecutionRequest type to include ConnectionSettings
  - Allow DeveloperConsole to pass DNS overrides to predict API
  - Add createHTTPClient method with custom DNS resolution to test_execution.go
  - Handle Host header specially in HTTP requests
@smarunich smarunich requested a review from Copilot July 22, 2025 17:10
Copilot

This comment was marked as outdated.

@smarunich smarunich requested a review from Copilot July 22, 2025 17:21
Copilot

This comment was marked as outdated.

smarunich and others added 9 commits July 22, 2025 17:28
Extended backend and frontend to discover and display additional Istio resources (DestinationRules, ServiceEntries, AuthorizationPolicies, PeerAuthentications) and KServe resources (InferenceServices, ServingRuntimes, ClusterServingRuntimes). Updated types and API responses, enhanced AdminResources UI with resource graph, filtering, and details panel for comprehensive platform overview.
Renamed 'Application Logs' to 'Model Service Logs' and updated its description for clarity. Changed 'System Console' to 'Platform Console' and revised its description to better reflect platform operations.
Introduces a new ModelsUsage component for monitoring model performance, usage analytics, and cost tracking. Integrates the dashboard into AdminDashboard with a new tab and icon, providing summary cards, model performance table, usage trends, and alerts.
Adjusted card and panel sizing in AdminResources and ResourceGraph components for better fit and appearance. Updated index.css to remove max-width constraints, added wide screen optimizations, and improved grid layouts for large displays.
@smarunich smarunich requested a review from Copilot July 23, 2025 01:54
@smarunich
Copy link
Owner Author

This pull request introduces significant updates to the documentation and functionality of the Inference-in-a-Box project, focusing on enhancing OpenAI API compatibility, improving usability, and expanding the feature set. Key changes include the addition of OpenAI-compatible endpoints, updates to the documentation for better navigation and clarity, and enhancements to the management service for model testing and configuration.

Documentation Enhancements:

  • Added navigation links to key sections in CLAUDE.md and README.md for improved usability and quick access to project resources. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R3-R8), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R3-R4))
  • Expanded GOALS.md with detailed project goals, architecture, and learning outcomes to provide a comprehensive overview of the project's vision and capabilities. ([GOALS.mdR1-R181](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-5bbc2983b6fd4fd4976388493f607af65ac79d1eb2c927b3bbcf7e860dcd9e97R1-R181))

OpenAI API Compatibility:

  • Updated Envoy AI Gateway to support OpenAI API-compatible endpoints for chat completions, embeddings, and protocol translation from OpenAI to KServe format. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R178), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L117-R119), [[3]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L196-R205))
  • Added examples for publishing OpenAI-compatible models with token-based rate limiting and testing them via curl commands. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R79-R84), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R114-R126))

Management Service Improvements:

  • Introduced interactive model testing capabilities in the management service, including support for OpenAI-style testing and real-time response visualization. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R220-R224), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R280-R286))
  • Enhanced the rate-limiting configuration to include token-based limits for LLM models. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L142-R178), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R268-R270))

Expanded Feature Set:

  • Added support for new model frameworks (vLLM, TGI) and OpenAI-compatible endpoints in the serverless model serving stack. ([[1]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7L158-R204), [[2]](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R137))
  • Included new configuration files and scripts for debugging, testing, and quick development restarts. ([CLAUDE.mdR220-R224](https://github.com/smarunich/inference-in-a-box/pull/15/files#diff-6ebdb617a8104a7756d0cf36578ab01103dc9f07e4dc6feb751296b9c402faf7R220-R224))

Copilot

This comment was marked as outdated.

Updated UI components and styles to use Tetrate's brand color palette and Poppins font for a more consistent and professional look. This includes changes to gradients, backgrounds, button styles, and tier colors in ResourceGraph, as well as introduction of CSS variables for easier theme management.
Changed the login page title and subtitle to 'Inference-in-a-Box' and updated styling for a more branded look. Modified admin access box colors and enhanced CSS for header and login card with Tetrate Orange accents and improved background gradients.
@smarunich smarunich requested a review from Copilot July 23, 2025 02:22
Copilot

This comment was marked as outdated.

Enhanced resource filtering in ResourceGraph to respect selected tiers, namespaces, and health status. Added spin animation for refresh icon in FilterPanel. Improved error handling and feedback in AdminResources and InferenceTest components. Updated node ID generation in AdminResources for better uniqueness. Added address format validation and DNS override documentation in test_execution.go.
@smarunich smarunich requested a review from Copilot July 23, 2025 02:30
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request focuses on enhancing the UI/UX design system with comprehensive Tetrate branding and significantly expanding the platform management interface. The changes introduce a cohesive brand identity using Tetrate's color palette, typography (Poppins font), and design patterns while adding advanced resource management and visualization capabilities.

Key changes include:

  • Complete UI rebrand with Tetrate color palette and Poppins typography
  • Major expansion of the platform management interface with advanced topology visualization
  • Enhanced model testing with DNS resolution support and improved connection settings

Reviewed Changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
management/ui/src/index.css Comprehensive rebrand with Tetrate color system and typography
management/ui/src/components/ResourceGraph.js New advanced topology visualization with tier-based layout
management/ui/src/components/ModelsUsage.js New usage analytics dashboard with performance metrics
management/ui/src/components/AdminResources.js Major platform navigator overhaul with enhanced resource management
management/ui/src/components/DeveloperConsole.js Enhanced developer tools with advanced connection settings
management/ui/src/components/InferenceTest.js Improved model testing with DNS resolution support
management/ui/src/components/FilterPanel.js New filtering system for resource management
management/ui/src/components/DetailsPanel.js New detailed resource inspection interface
management/ui/src/components/ErrorBoundary.js New error handling component with detailed debugging
management/ui/package.json Added ReactFlow dependency for topology visualization
Comments suppressed due to low confidence (2)

management/ui/src/components/PublishingForm.js:142

  • Inconsistent naming convention. The field name changed from 'tenantID' to 'tenantId'. Ensure this change is intentional and that all consuming services expect 'tenantId' instead of 'tenantID'.
        requestBody.config.tenantId = formData.tenantId;

management/ui/src/components/ResourceGraph.js:25

  • [nitpick] CSS keyframes are being injected dynamically into the document head. Consider moving this to a CSS file or using CSS-in-JS libraries for better maintainability and performance.
  Filter,

@@ -431,7 +431,8 @@ install_envoy_gateway() {
--version ${ENVOY_GATEWAY_VERSION} \
--namespace envoy-gateway-system \
--create-namespace

kubectl label namespace envoy-gateway-system --overwrite=true istio-injection=enabled
Copy link
Preview

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding Istio injection to the envoy-gateway-system namespace could cause conflicts or unexpected behavior. Consider documenting why this is necessary or making it conditional based on the deployment configuration.

Suggested change
kubectl label namespace envoy-gateway-system --overwrite=true istio-injection=enabled
# Conditionally enable Istio injection for the envoy-gateway-system namespace
if [ "${ENABLE_ISTIO_INJECTION}" = "true" ]; then
log "Enabling Istio injection for the envoy-gateway-system namespace..."
kubectl label namespace envoy-gateway-system --overwrite=true istio-injection=enabled
else
log "Skipping Istio injection for the envoy-gateway-system namespace."
fi

Copilot uses AI. Check for mistakes.

@@ -476,7 +477,7 @@ install_envoy_ai_gateway() {
kubectl apply -f ${PROJECT_DIR}/configs/envoy-gateway/gateway/ai-gateway.yaml

# Apply Backend resources
kubectl apply -f ${PROJECT_DIR}/configs/envoy-gateway/backends/backends.yaml
#kubectl apply -f ${PROJECT_DIR}/configs/envoy-gateway/backends/backends.yaml
Copy link
Preview

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration file is commented out without explanation. If this is intentional for this release, consider adding a comment explaining why, or remove the line entirely to avoid confusion.

Copilot uses AI. Check for mistakes.

Comment on lines +18 to +33
// Add the spin animation keyframes
const spinKeyframes = `
@keyframes spin {
from { transform: rotate(0deg); }
to { transform: rotate(360deg); }
}
`;

// Inject the CSS into the document head if not already present
if (typeof document !== 'undefined' && !document.querySelector('#spin-animation')) {
const style = document.createElement('style');
style.id = 'spin-animation';
style.textContent = spinKeyframes;
document.head.appendChild(style);
}

Copy link
Preview

Copilot AI Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate CSS injection logic appears in multiple files. Consider creating a shared utility function or moving animations to a CSS file to avoid code duplication.

Suggested change
// Add the spin animation keyframes
const spinKeyframes = `
@keyframes spin {
from { transform: rotate(0deg); }
to { transform: rotate(360deg); }
}
`;
// Inject the CSS into the document head if not already present
if (typeof document !== 'undefined' && !document.querySelector('#spin-animation')) {
const style = document.createElement('style');
style.id = 'spin-animation';
style.textContent = spinKeyframes;
document.head.appendChild(style);
}
// Removed inline CSS injection logic. The `spin` animation is now defined in an external CSS file.

Copilot uses AI. Check for mistakes.

@smarunich smarunich merged commit 1106ade into main Jul 23, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant