Releases: envoyproxy/ai-gateway
v0.3.0-rc1
Release candidate for v0.3.0!
helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.3.0-rc1 --namespace envoy-ai-gateway-system --create-namespace
v0.2.1
v0.2.0
Envoy AI Gateway v0.2.x
June 5, 2025
Envoy AI Gateway v0.2.0 builds upon the solid foundation of v0.1.0 with focus on expanding provider ecosystem support, improving reliability and performance through architectural changes, and enterprise-grade authentication support for Azure OpenAI.
Azure OpenAI Integration
Sidecar Architecture
Performance Improvements
CLI Tools
Model Failover and Retry
Certificate Manager Integration
✨ New Features
Azure OpenAI Integration
- Full Azure OpenAI Support
- Complete integration with Azure OpenAI services with request/response transformation for the unified OpenAI compatiple completions API.
- Upstream Authentication for Azure Enterprise Integration
- Support for accessing Azure via OIDC tokens and Entra ID for enterprise-grade authentication for secure and compliant upstream authentication.
- Enterprise Proxy URL Support for Azure Authentication
- Enhanced Azure authentication with proxy URL configuration options for enterprise proxy support.
- Flexible Token Providers
- Generalized token provider architecture supporting both client secret and federated token flows
Architecture Improvements
- Sidecar and UDS External Processor
- Switched to sidecar deployment model with Unix Domain Sockets for improved performance and resource efficiency
- Enhanced ExtProc Buffer Limits
- Increased external processor buffer limits from 32 KiB to 50 MiB for larger AI requests. Users can now configure CPU and memory resource limits via
filterConfig.externalProcessor.resources
for better resource management.
- Increased external processor buffer limits from 32 KiB to 50 MiB for larger AI requests. Users can now configure CPU and memory resource limits via
- Multiple AIGatewayRoute Support
- Support for multiple AIGatewayRoute resources per gateway, removing the previous single-route limitation. This enables better organization, scalability, and management of complex routing configurations across teams.
- Certificate Manager Integration
- Integrated cert-manager for automated TLS certificate provisioning and rotation for the mutating webhook server that injects AI Gateway sidecar containers into Envoy Gateway pods. This enables enterprise-grade certificate management, eliminating manual certificate handling and improving security.
Cross-Backend Failover and Retry
- Provider Fallback Logic
- Priority-based failover system that automatically routes traffic to lower priority AI providers as higher priority endpoints become unhealthy, ensuring high availability and fault tolerance.
- Backend Retry Support
- Configurable retry policies for improved reliability and resilience against AI provider transient failures. Features include exponential backoff with jitter, configurable retry triggers (5xx errors, connection failures, rate limiting), customizable retry counts and timeouts, and integration with Envoy Gateway's
BackendTrafficPolicy
.
- Configurable retry policies for improved reliability and resilience against AI provider transient failures. Features include exponential backoff with jitter, configurable retry triggers (5xx errors, connection failures, rate limiting), customizable retry counts and timeouts, and integration with Envoy Gateway's
- Weight-Based Routing
- Enhanced backend routing with weighted traffic distribution, enabling gradual rollouts, cost optimization, and A/B testing across multiple AI providers
Enhanced CLI Tools
-
aigw run
Command- New CLI command for local development and testing of Envoy AI Gateway resources.
-
Configuration Translation
aigw translate
for translating Envoy AI Gateway Resources to Envoy Gateway and Kubernetes CRDs.
🔗 API Updates
- AIGatewayRoute Metadata: Added ownedBy and createdAt fields for better resource tracking.
- Backend Configuration: Moved Backend configuration back to RouteRule for improved flexibility.
- OIDC Field Types: Specific typing for OIDC-related configuration fields.
- Weight Type Changes: Updated Weight field type to match Gateway API specifications.
Deprecations
- AIServiceBackend.Timeouts: Deprecated in favor of more granular timeout configuration.
🐛 Bug Fixes
- ExtProc Image Syncing: Fixed issue where external processor image wouldn't sync properly.
- Router Weight Validation: Fixed negative weight validation in routing logic.
- Content Body Handling: Fixed empty content body issues causing AWS validation errors.
- First Match Routing: Fixed router logic to ensure first match wins as expected.
⚠️ Breaking Changes
- Sidecar Architecture: The switch to sidecar and UDS model may require configuration updates for existing deployments.
- API Field Changes: Some API fields have been moved or renamed - see migration guide for details. Please review the migration guide for details.
- Timeout Configuration: Deprecated timeout fields require migration to new configuration format.
- Routing to Kubernetes Services: Routing to Kubernetes services is not supported in Envoy AI Gateway v0.2.0. This is a known limitation and will be addressed in a future release.
📖 Upgrade Guidance
For users upgrading from v0.1.x to v0.2.0:
- Review usage of any deprecated API fields (particularly AIServiceBackend.Timeouts).
- Update deployment configurations if using custom replica configurations - the replicas field in AIGatewayFilterConfigExternalProcessor is now deprecated due to the new sidecar architecture.
- Remove routing to Kubernetes services - currently, Envoy AI Gateway does not support routing to Kubernetes services. This is a known limitation and will be addressed in a future release.
📦 Dependencies Versions
- Go 1.24.2 - Updated to latest Go version for improved performance and security.
- Envoy Gateway v1.4 - Built on Envoy Gateway for proven data plane capabilities.
- Envoy v1.34 - Leveraging Envoy Proxy's battle-tested networking capabilities.
- Gateway API v1.3 - Support for latest Gateway API specifications.
🙏 Acknowledgements
This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Google, and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.
There are those who engage in conversations, provide feedback, and contribute to the project in other ways than code, and we appreciate them greatly. Ideas, suggestions, and feedback are always welcome.
🔮 What's Next (beyond v0.2)
We're already working on exciting features:
- Google Gemini & Vertex Integration
- Anthropic Integration
- Support for the Gateway API Inference Extension
- Endpoint picker support for Pod routing
- What else do you want to see? Get involved and open an issue and let us know!
v0.2.0-rc3
Release candidate
v0.2.0-rc1
Release candidate
v0.1.5
v0.1.4
v0.1.3
Overview
This patch release v0.1.3
includes fixes to chat completion streaming and openai assistant content type, and adds genai metrics.