-
Notifications
You must be signed in to change notification settings - Fork 129
Added valkey dependency #875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Implement distributed caching using Valkey (Redis-compatible) to enable
horizontal scaling of Trino Gateway across multiple instances. This allows
query metadata to be shared between gateway instances, ensuring consistent
routing regardless of which instance receives a request.
Key features:
- 3-tier caching architecture: L1 (Guava local) → L2 (Valkey distributed) → L3 (PostgreSQL)
- Graceful degradation when Valkey unavailable (falls back to database)
- Configurable health checks and connection pooling
- Cache metrics (hits, misses, writes, errors, hit rate)
- Write-through caching for backend and routing group lookups
- Lazy-loading for external URL lookups
- Convention over Configuration with sensible defaults
Implementation:
- Add ValkeyConfiguration with 11 configurable parameters (minimal 3 required)
- Create DistributedCache interface and ValkeyDistributedCache implementation
- Integrate distributed cache into BaseRoutingManager routing logic
- Use modern Duration API (no deprecated methods)
- Add comprehensive input validation and error handling
- Include 31 unit tests (16 config + 15 cache tests)
Configuration:
valkeyConfiguration:
enabled: true
host: valkey.internal
port: 6379
password: ${VALKEY_PASSWORD}
Documentation includes:
- Quick start guide with minimal configuration
- Full configuration reference with tuning guidelines
- Deployment scenarios (single vs. multi-instance)
- Performance tuning recommendations
- Security best practices
- Architecture documentation and troubleshooting
Single-instance deployments don't need distributed caching - local Guava
cache is sufficient. Multi-instance deployments benefit from shared cache
for consistent query routing.
Implement distributed caching using Valkey (Redis-compatible) to enable
horizontal scaling of Trino Gateway across multiple instances. This allows
query metadata to be shared between gateway instances, ensuring consistent
routing regardless of which instance receives a request.
Key features:
- 3-tier caching architecture: L1 (Guava local) → L2 (Valkey distributed) → L3 (PostgreSQL)
- Graceful degradation when Valkey unavailable (falls back to database)
- Configurable health checks and connection pooling
- Cache metrics (hits, misses, writes, errors, hit rate)
- Write-through caching for backend and routing group lookups
- Lazy-loading for external URL lookups
- Convention over Configuration with sensible defaults
Implementation:
- Add ValkeyConfiguration with 11 configurable parameters (minimal 3 required)
- Create DistributedCache interface and ValkeyDistributedCache implementation
- Integrate distributed cache into BaseRoutingManager routing logic
- Use modern Duration API (no deprecated methods)
- Add comprehensive input validation and error handling
- Include 31 unit tests (16 config + 15 cache tests)
Configuration:
valkeyConfiguration:
enabled: true
host: valkey.internal
port: 6379
password: ${VALKEY_PASSWORD}
Documentation includes:
- Quick start guide with minimal configuration
- Full configuration reference with tuning guidelines
- Deployment scenarios (single vs. multi-instance)
- Performance tuning recommendations
- Security best practices
- Architecture documentation and troubleshooting
Single-instance deployments don't need distributed caching - local Guava
cache is sufficient. Multi-instance deployments benefit from shared cache
for consistent query routing.
…endency # Conflicts: # docs/config.yaml # docs/installation.md # docs/valkey-configuration.md # gateway-ha/config.yaml
After merge with main branch, UserConfiguration and ApiAuthenticator imports are no longer used due to code refactoring. Removing them to fix checkstyle violations. - Remove unused import: io.trino.gateway.ha.config.UserConfiguration - Remove unused import: io.trino.gateway.ha.security.ApiAuthenticator
After merge, code uses ImmutableList, MonitorConfiguration, and List but imports were missing causing compilation failures. - Add import: com.google.common.collect.ImmutableList (for getClusterStatsObservers) - Add import: io.trino.gateway.ha.config.MonitorConfiguration (for getMonitorConfiguration) - Add import: java.util.List (for method return type)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Valkey Distributed Cache for Horizontal Scaling
Summary
This PR implements distributed caching using Valkey (Redis-compatible) to enable horizontal scaling of Trino Gateway. Multiple gateway instances can now share query metadata through a distributed cache layer, ensuring consistent query routing across all
instances.
Motivation
Currently, Trino Gateway uses local Guava caches that are not shared between instances. In multi-instance deployments, this can lead to:
This implementation addresses these limitations while maintaining backward compatibility and graceful degradation.
Architecture
3-Tier Caching Strategy
Request Flow:
├─ Hit: Return immediately
└─ Miss: Check L2
├─ Hit: Populate L1, return
└─ Miss: Check L3
├─ Found: Populate L2 + L1, return
└─ Not Found: Search backends via HTTP (200ms)
Cache Keys
trino:query:backend:{queryId}- Backend URL for querytrino:query:routinggroup:{queryId}- Routing group for querytrino:query:externalurl:{queryId}- External URL (lazy-loaded)(@VisibleForTesting)Implementation Details
Core Components
ValkeyConfiguration (
gateway-ha/src/main/java/io/trino/gateway/ha/config/ValkeyConfiguration.java)ValkeyDistributedCache (
gateway-ha/src/main/java/io/trino/gateway/ha/router/ValkeyDistributedCache.java)DistributedCacheinterfaceDistributedCache Interface (
gateway-ha/src/main/java/io/trino/gateway/ha/router/DistributedCache.java)Integration
BaseRoutingManager - Updated routing logic:
HaGatewayProviderModule - Dependency injection:
DistributedCachesingletonConfiguration
Minimal (Recommended for Getting Started)
Advanced (Production Tuning)
Single Instance (No Changes Required)
valkeyConfiguration:
enabled: false # Default - local cache sufficient
Testing
Unit Tests (31 total, all passing)
TestValkeyConfiguration (16 tests)
TestValkeyDistributedCache (15 tests)
Integration Tests (existing tests updated)
Documentation
Comprehensive documentation added:
New File: docs/valkey-configuration.md (273 lines)
Updated Files:
Backward Compatibility
✅ Fully backward compatible
Migration Path
From Single to Multi-Gateway
valkeyConfiguration:
enabled: true
host: valkey.internal
port: 6379
password: ${VALKEY_PASSWORD}
No data migration needed - cache populates automatically.
Graceful Degradation
When Valkey is unavailable:
Monitoring
Cache metrics available via ValkeyDistributedCache:
Future work: Expose these via /metrics endpoint for Prometheus.
Dependencies
Added: io.valkey:valkey-java:5.5.0
Files Changed
New Files (7)
Modified Files (16)
Checklist
Future Enhancements
Testing Instructions
Local Testing (Single Instance)
No changes needed - works as before
java -jar gateway-ha.jar config.yaml
Multi-Instance with Valkey
Start Valkey
docker run -d -p 6379:6379 valkey/valkey:latest
Update config.yaml
valkeyConfiguration:
enabled: true
host: localhost
port: 6379
Start multiple gateways
java -jar gateway-ha.jar config.yaml
Verify Cache Working
Check logs for:
"Valkey distributed cache initialized: localhost:6379"
"Valkey health check passed"
Submit queries and verify cache hits increase