Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Valkey Distributed Cache for Horizontal Scaling
Summary
This PR implements distributed caching using Valkey to enable horizontal scaling of Trino Gateway. Multiple gateway instances can now share query metadata through a distributed cache layer, ensuring consistent query routing across all
instances.
Motivation
Currently, Trino Gateway uses local Guava caches that are not shared between instances. In multi-instance deployments, this can lead to:
This implementation addresses these limitations while maintaining backward compatibility and graceful degradation.
Architecture
3-Tier Caching Strategy
Request Flow:
├─ Hit: Return immediately
└─ Miss: Check L2
├─ Hit: Populate L1, return
└─ Miss: Check L3
├─ Found: Populate L2 + L1, return
└─ Not Found: Search backends via HTTP (200ms)
Cache Keys
trino:query:backend:{queryId}- Backend URL for querytrino:query:routinggroup:{queryId}- Routing group for querytrino:query:externalurl:{queryId}- External URL (lazy-loaded)Implementation Details
Core Components
ValkeyConfiguration (
gateway-ha/src/main/java/io/trino/gateway/ha/config/ValkeyConfiguration.java)ValkeyDistributedCache (
gateway-ha/src/main/java/io/trino/gateway/ha/router/ValkeyDistributedCache.java)DistributedCacheinterfaceDistributedCache Interface (
gateway-ha/src/main/java/io/trino/gateway/ha/router/DistributedCache.java)Integration
BaseRoutingManager - Updated routing logic:
HaGatewayProviderModule - Dependency injection:
DistributedCachesingletonConfiguration
Minimal (Recommended for Getting Started)
Advanced (Production Tuning)
Single Instance (No Changes Required)
valkeyConfiguration:
enabled: false # Default - local cache sufficient
Testing
Unit Tests (31 total, all passing)
TestValkeyConfiguration (16 tests)
TestValkeyDistributedCache (15 tests)
Integration Tests (existing tests updated)
Documentation
Comprehensive documentation added:
New File: docs/valkey-configuration.md (273 lines)
Updated Files:
Backward Compatibility
✅ Fully backward compatible
Migration Path
From Single to Multi-Gateway
valkeyConfiguration:
enabled: true
host: valkey.internal
port: 6379
password: ${VALKEY_PASSWORD}
No data migration needed - cache populates automatically.
Graceful Degradation
When Valkey is unavailable:
Monitoring
Cache metrics available via ValkeyDistributedCache:
Future work: Expose these via /metrics endpoint for Prometheus.
Dependencies
Added: io.valkey:valkey-java:5.5.0
Files Changed
New Files (7)
Modified Files (16)
Checklist
Future Enhancements
Testing Instructions
Local Testing (Single Instance)
No changes needed - works as before
java -jar gateway-ha.jar config.yaml
Multi-Instance with Valkey
Start Valkey
docker run -d -p 6379:6379 valkey/valkey:latest
Update config.yaml
valkeyConfiguration:
enabled: true
host: localhost
port: 6379
Start multiple gateways
java -jar gateway-ha.jar config.yaml
Verify Cache Working