-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sticky routing based on next-uri fields #446
Comments
We talked a bit about making the GW more of a "full proxy" in one of the recent GW dev syncs. It potentially unlocks a lot of new capabilities. I like the idea you've proposed here of embedding this state into the client instead of storing it on the GW side. Tracking when a query has finished, and thus its state can be cleaned up, is an annoying process. Right now we just have a periodic task, every 2 hours, to clear our query records older than a configurable time window (but that query may actually still be running!): trino-gateway/gateway-ha/src/main/java/io/trino/gateway/ha/persistence/JdbcConnectionManager.java Lines 72 to 82 in f50b09d
Moving it to the client is in line with Trino philosophy in general, IMO, like how we implement session properties and prepared statements on the client-side. For the UI, I think as you said, we could do a fan-out that pulls query results from each backend ... That also has the benefit of not having two copies of the same data (query IDs / query history stored on both GW and Coordinator). Curious to hear what others think, but personally at first pass I like the idea. One thing we should consider is whether this would make it harder to implement other new functionality in the future. |
Yes! This is the exact concern I have too. We evaluated Trino Gateway vs running Envoy with a query ID cache vs just getting a thin layer of rewriting headers for next-uri in combination with some cloud load balancers a while ago. It's great to see that Trino Gateway is now officially part of Trino project and is collaborating with Trino! We could actually achieve this next-uri design even as of today with the current Trino Gateway, if we tweak the X-Forwarded-* headers rewriting logic in some way and put the Trino coordinators on their own domains (eg. trino-gw.mydomain, trino-1.mydomain, trino-2.mydomain). In this way, Trino Gateway effectively acts as a query dispatcher, and the subsequent calls won't go through Trino Gateway. |
I had been thinking about routing using QueryID. When Trino coordinator starts, it generates a random If we can keep track of the For example, all the query ID from the same coordinator have the same suffix:
|
Having a coordinator (or cluster id) as a part of the trino protocol is a good idea. This could also solve the issue# 465 |
Currently we can't obtain the |
How can I help here? :) |
I think it would be great if you can chime in at trinodb/trino#23910 and help there and also take this into account for the spooling protocol work @wendigo |
Looks like we can modify the
One interesting thing is the url with prefix won't work. Coordinator will return 404 for that.
|
Yes! This X-Fowarded-xx is exactly what I was trying to propose! :)
|
@oneonestar yeah, we plan to add support for it to the client as well, but for now the server-to-server should work just fine |
@oneonestar Sorry I missed out this part. What I had in mind is that we could use X-Forwarded-xx headers to point the next-uri / info-uri to the configured external URL of the backends, which doesn't go through Trino Gateway anymore. Say you have Alternatively, if we want to manipulate next-uri / info-uri with some-prefix on the same host with Trino Gateway, we would need to set up some proxy rules to proxy the requests to the proper clusters based on the prefix, and when Trino coordinator gets this request, the URL won't contain that prefix anymore. When Trino Gateway sees that prefix, it knows which backend this request needs to go. |
Hi folks,
Have we considered rewriting next-uri and info-uri directly from the responses in order to achieve query-level sticky routing?
The idea is kinda similar to Trino Proxy, where Trino Gateway proxies all requests. Then we bind the URLs in the following ways:
/v1/...
: handled by the current logic and proxied to one of the Trino coordinators based on whatever load balancing algorithms of choice. In the response, we rewrite the next-uri / info-uri to something like/backend1/v1/...
that is directly pointing to the particular backend (or rewrite the X-Forwarded-* headers to achieve the same goal)./backend1/v1/...
: directly proxied to backend 1 (but block POST to/v1/statement
so that queries can only be submitted via the load balancer);/backend2/v1/...
: directly proxied to backend 2 (but block POST to/v1/statement
so that queries can only be submitted via the load balancer);With this approach, for query-level sticky routing, we don't need to track which backend each query id gets assigned to. Instead, such assignment is retained on the client side.
The caveat is that for the Trino UI, we would need to develop a way for users to do a combined search queries across all backends as well as a summary of all backend's stats.
Has this approach been considered in the past? We could eliminate the dependency on the databases / caches. If cross-regional networking could be a concern, we could even change the URLs with different domains to avoid inter-regional proxying.
I know Trino Gateway's architecture is pretty much set, so it's not necessarily something we have to do now, but mostly a discussion just in case later on it's needed.
George
The text was updated successfully, but these errors were encountered: