-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROPOSAL] Eliminate all Transport calls from Extensions to OpenSearch #767
Comments
This diagram summarizes the constraints driving this request. Currently only "Extension B" is a tenable solution. "Extension A" is unreachable from the internet without some extra setup/roles/permissions/etc. on the account, which is not trivial. "Extension C" has no way to initiate transport connections with OpenSearch without some sort of middle layer. That middle layer already exists as the REST API accessible via load balancer. To enable transport we would have to add our own intermediate Bastion Host to forward requests. Possible, but unnecessary. |
Hi @dbwiddis, after taking a cursory look at your write up and diagram, I like the idea. From a Security perspective I see a couple points that we will need to consider. The biggest security matter I see arising is verification of the identity of the extension. From what I know of the SDK, my understanding is that the current security implementation is handled by the TLS changes that @cwperks introduced. With these changes, a cluster manager can verify that an extension is who they say they are by verifying the certificates. Swapping to an entirely REST based system will require a different approach. To my knowledge, there is nothing in place that would allow extensions to make REST requests which bypass the generally implemented REST security features of the cluster. That means that if the extension wants to make a REST request, and you have Security enabled, the extension will need to provide the same credentials as would be expected of any other REST request. Fortunately, I believe that the Service Account concept which was introduced in tho the Security Plugin will help address this requirement. The Service Account can be used to generate auth tokens for the extension which can be used to auth its REST requests. However, the Service Account is only created on extension registration and does not have any of the immediate safeguards the TLS system has. You cannot use the information associated with the Service Account for initialization because it will not yet exist. We will not have any mechanism for verifying an extension is who they say they are on first installation. |
Whatever identity certificates are needed could be sent with the original/initiating REST request as part of the body.
We're not going entirely REST; we're just limiting the Transport connections to those originated from OpenSearch. |
How does the discovery process currently work when a node joins the cluster? Would that be the same problem as a new node wanting to join a cluster?
The security plugin has solved a similar problem where the security configuration needs to be sync'ed on all the nodes. If a new user is created then all nodes need to be aware of the new user. The way the security plugin solves this is that it stores the configuration in a system index and on any modification to the index it sends a TransportAction to all nodes of the cluster to tell it to reload the security configuration from the index into memory. There is some time needed for propagation, but its how all nodes of the cluster become synchronized to changes in the security index. |
The node sends out a unicast ping to the discovery seed host, and says "Hi, can I join your cluster?" See https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/discovery/
As long as the node's in the subnet, it can send info to the discovery seed host. However, for an extension outside the subnet, it cannot send a message to the internal address.
@owaiskazi19 is looking to solve something similar here for the extension API
Yeah, we've discussed using a system (or any) index for this and that might work. Looking at other options as well. |
What/Why
What are you proposing?
Extensions presently rely on transport calls to OpenSearch for both initialization and ongoing actions. This requires the extension to maintain an open (and secure) network connection with OpenSearch.
Originating Transport connections from OpenSearch to an extension is a reasonable option (but see #746 for a discussion of making this an optional alternative to REST or other (gRPC?) communication layer). However, an Extension making a transport call back to OpenSearch introduces multiple complexities such as:
All of the initialization calls can be replaced by a response to the initialization call.
Other uses of transport should be replaced by equivalent REST requests.
What users have asked for this feature?
This issue was highlighted during performance testing setup (#652) and has prompted still-unfixed bugs associated with the lack of valid OpenSearch IP address (#761) and collisions with transport service headers (#771). A continuous connection also is incompatible with a serverless design (#746).
What problems are you trying to solve?
What is the developer experience going to be?
SDK developers will need to rely on the REST layer for any communications originated to OpenSearch
OpenSearch will need a new REST endpoint for extensions to communicate with; however the byte streams used can be the same "Writeable" or protobuf bytes, just in the body of an HTTP request.
Extension developers will see no change; all extension communication was already intended to be over REST.
Are there any security considerations?
This should reduce the security considerations of having both REST and Transport, making the communications only via REST.
Are there any breaking changes to the API
We are still pre-release behind a feature flag so it's in line with the continuously changing breaking development.
What is the user experience going to be?
Users will not have to know any of the complex details of where nodes live and how they are reachable. They simply provide a single endpoint, probably a load balancer, and the SDK handles the rest.
Are there breaking changes to the User Experience?
No users have experienced it yet!
Why should it be built? Any reason not to?
It's a good solution to a wide range of problems that would avoid many other patchwork fixes.
What will it take to execute?
Replace existing Transport calls originating from Extensions with alternatives.
The following call occurs during initialization and should be replaced by including the information into the Initialization Request class:
sendEnvironmentSettingsRequest()
The following calls occur during initialization and should be replaced by including the information they communicate into the initialization Response class:
sendRegisterRestActionsRequest()
sendRegisterCustomSettingsRequest()
sendRegisterTransportActionsRequest()
The following calls are presently unimplemented, planned for the future:
sendExtensionDependencyRequest()
could be included as part of initialization requestsendClusterSettingsRequest()
could be easily replaced by a REST callThe following calls occur as part of calls from the SDKClient and its ecosystem. These implementations would be the hardest part of this proposal.
sendClusterStateRequest()
. This is called fromSDKClusterState.state()
and has already been called out as problematic from a performance aspect ([FEATURE] Replace calls to SDKClusterService state() with more targeted calls #674). A Cluster State REST request already exists but needs manual parsing of its JSON response. See discussion on ([Extensions] Add ClusterStateRequest parameter to cluster state transport request OpenSearch#7066).sendAddSettingsUpdateConsumerRequest()
. This doesn't yet work anyway (see [BUG] ExtensionsRunner sendAddSettingsUpdateConsumerRequest fails #366) but could be replaced by a targeted REST request; the updates are already implemented via Transport but could be handled differently in a stateless enviornment.sendRemoteExtensionActionRequest()
. This triggers remote transport actions. It can be easily moved to a REST call.Any remaining open questions?
This should likely be part of a larger-scale integrated design for serverless which will employ a similar transport/REST/gRPC equivalent in the other direction.
The text was updated successfully, but these errors were encountered: