You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two recent developments that make me think we can improve on disaggregated coordinators architecture.
As background, originally, one of the design goals for disaggregated coordinators was to avoid any additional infrastructure. For example, we didn't want to introduce a dependency on Zookeeper, because this would increase the infrastructure requirements for anyone who wanted to use the feature.
However, there are two recent developments that I think change the picture:
With Amazon S3 announcing support for conditional writes, now all major cloud storage providers have the ability to provide distributed locks: GCS and Azure through compare-and-swap, and S3 with conditional writes. For on-prem installations, Ceph supports conditional PUT, and MinIO also supports conditional writes. So, there's an argument to be made that at this point, it's difficult to not find a storage platform that doesn't provide the ability to let you use conditional locks, and because distributed storage is essentially a requirement for most use cases of Presto, we could implement leader election using distributed storage. Of course, the mechanism to establish the leader would be encapsulated behind a plugin, and potentially other mechanisms like Zookeeper or etcd could be implemented as well.
The PUT API recently introduced could be used by the leader, established by the route above, to distribute query execution management amongst multiple coordinators.
There are still some questions to figure out, such as where to place the discovery server and whether or not we will still need a resource manager. One thought is that the leader takes on the responsibility of the resource manager. Or we could keep that process as is. If the leader takes on the responsibility of the resource manager, then the discovery server would need to redirect nodes to the leader in case of a failover.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
There are two recent developments that make me think we can improve on disaggregated coordinators architecture.
As background, originally, one of the design goals for disaggregated coordinators was to avoid any additional infrastructure. For example, we didn't want to introduce a dependency on Zookeeper, because this would increase the infrastructure requirements for anyone who wanted to use the feature.
However, there are two recent developments that I think change the picture:
PUT
API recently introduced could be used by the leader, established by the route above, to distribute query execution management amongst multiple coordinators.There are still some questions to figure out, such as where to place the discovery server and whether or not we will still need a resource manager. One thought is that the leader takes on the responsibility of the resource manager. Or we could keep that process as is. If the leader takes on the responsibility of the resource manager, then the discovery server would need to redirect nodes to the leader in case of a failover.
Beta Was this translation helpful? Give feedback.
All reactions