-
Notifications
You must be signed in to change notification settings - Fork 768
Add recipe on ECK multi-tenancy #8735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Warning It looks like this PR modifies one or more |
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, much appreciated! I wonder though if a blog post would be a better place for this? The way we use the recipes folder is for concrete yaml manifests that users can apply (with minimal modifcation) to test drive/POC certain setups. The yaml part of your write up is comparatively small and serves more to illustrate the accompanying text.
@@ -73,3 +73,6 @@ Chart.lock | |||
# build | |||
build/dev* | |||
build/eck* | |||
|
|||
# macOS specific files | |||
.DS_Store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use your personal global.gitignore
for platform specific files
podTemplate: | ||
metadata: | ||
labels: | ||
elasticsearch.k8s.elastic.co/cluster-name: cluster-a | ||
spec: | ||
tolerations: # I "accept" the nodes defined for my team | ||
- key: "taints.demo.elastic.co/team" | ||
operator: "Equal" | ||
value: "team-a" | ||
effect: "NoSchedule" | ||
affinity: # I want the nodes defined for my project | ||
nodeAffinity: | ||
requiredDuringSchedulingIgnoredDuringExecution: | ||
nodeSelectorTerms: | ||
- matchExpressions: | ||
- key: "labels.demo.elastic.co/team" | ||
operator: "In" | ||
values: | ||
- "team-a" | ||
podAntiAffinity: # Try to not put me on the same host where other pods for the same ES cluster are running | ||
preferredDuringSchedulingIgnoredDuringExecution: # or requiredDuring... | ||
- weight: 100 | ||
podAffinityTerm: | ||
labelSelector: | ||
matchLabels: | ||
elasticsearch.k8s.elastic.co/cluster-name: cluster-a | ||
topologyKey: kubernetes.io/hostname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this is actually ECK specific. The use of taints/tolerations node and pod anti-affinity applies to any workload on Kubernetes.
- There is no environment in which to test Kubernetes and elastic-operator upgrades, which means each upgrade is going to be fire-and-pray. | ||
- Depending on the implementation, this architecture could become a noisy neighbors party. For instance, a misconfigured development cluster could saturate the underlying host's resources or bandwidth, hence degrading the performance of the pods deployed on the same host. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the informal/colloquial tone here ("fire-and-pray", "noisy neighbors party") fits our reference documentation.
The main two options are: | ||
|
||
- Both the production and non-production Monitoring clusters live in a single, separate Kubernetes cluster | ||
- Each Monitoring cluster lives in its own Kubernetes cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Each Monitoring cluster lives in its own Kubernetes cluster | |
- Each monitoring cluster lives in its own Kubernetes cluster |
|
||
=== Reference architecture #2: one Kubernetes cluster per Elasticsearch deployment | ||
|
||
Many Elastic Stack admins opt for having 1:1 mapping between Elasticsearch clusters and Kubernetes clusters, meaning that each Kubernetes cluster is fully dedicated to one single Elasticsearch cluster. This allows for even stronger hard multi-tenancy (assuming this can be considered multi-tenancy) and does not require configurations such as the taints and tolerations, but requires the capability to run a fleet of Kubernetes clusters, which is a task on its own. In other words, in this case customers will intentionally decide to have a fleet of Kubernetes clusters, and will have all the needed automation to manage them. If no such automation is available, it is almost guaranteed that the final outcome will be an impossible-to-manage Kubernetes and Elasticsearch cluster sprawl. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that really the case? Do we have data to back that claim up? I am surprised that "many" admins would choose that approach given the operational overhead of running a dedicated k8s cluster. But I could be wrong.
|
||
image::prod-and-non-prod-hard.jpeg[Reference architecture #1: production and non-production with hard multi-tenancy,align="center"] | ||
|
||
In this architecture, we use Kubernetes namespaces to achieve "soft" multi-tenancy, and pair that with Kubernetes taints, tolerations, and nodeAffinity to ensure "hard" multi-tenancy, so that a node in the Kubernetes cluster will only host pods for Elasticsearch clusters belonging to the same team. This scenario enforces stricter separation of concerns, but comes at a cost: it is in fact highly unlikely that such a deployment would allow for a similar level of resource and cost efficiency, since it probably requires more nodes to be added to the Kubernetes cluster than strictly necessary, likely ending up with some of them being under-utilized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could argue that this is not really hard multi-tenancy as you would still share the same node pools and the same control plane i.e. effectively a shared database. So to make this multi-tenancy "harder" one would could want dedicated node pools and maybe even virtual clusters per tenant (which however might make things a bit complicated for storage provisioning especially for local storage with Elasticsearch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to make this multi-tenancy "harder" one would could want dedicated node pools
If you look at the YAML example, we are targeting specific nodes via affinity, taints, and tolerations. In a real-world scenario, those nodes would indeed belong to different nodepools (think dedicated ASGs on AWS). You are right that I should make it clearer :)
Since I thought "recipe" when seeing it first and then we redirected it: This IMO makes for a pretty dry blog post. Maybe it needs a bit more YAML and actual recipes to be a better fit? I didn't see much in the docs about tolerations for example, which I thought would still be a good addition? |
Yes maybe if we had a complete working example it would make more sense. |
@xeraa , actually, the team is looking to start blogging way more heavily over ECK to push its marketing forward.. I think adding a couple of diagrams to this might make it a good blogpost to start with |
Co-authored-by: Peter Brachwitz <[email protected]>
Co-authored-by: Peter Brachwitz <[email protected]>
Co-authored-by: Peter Brachwitz <[email protected]>
I'm closing this PR as this is going to be released as a blog post. |
This PR adds a much-needed recipe on ECK multi-tenancy. While this is nothing definitive, in Services we see many customers interested in the topic. This recipe aims to be a starting point for sparking thoughts and discussions on how to best structure multi-tenant ECK deployments especially in complex organizations with tens of teams and numerous environments.