Skip to content

Add recipe on ECK multi-tenancy #8735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

LolloneS
Copy link

This PR adds a much-needed recipe on ECK multi-tenancy. While this is nothing definitive, in Services we see many customers interested in the topic. This recipe aims to be a starting point for sparking thoughts and discussions on how to best structure multi-tenant ECK deployments especially in complex organizations with tens of teams and numerous environments.

Copy link

Warning

It looks like this PR modifies one or more .asciidoc files. These files are being migrated to Markdown, and any changes merged now will be lost. See the migration guide for details.

@prodsecmachine
Copy link
Collaborator

prodsecmachine commented Jul 11, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@botelastic botelastic bot added the triage label Jul 11, 2025
@LolloneS LolloneS added >docs Documentation and removed triage labels Jul 11, 2025
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, much appreciated! I wonder though if a blog post would be a better place for this? The way we use the recipes folder is for concrete yaml manifests that users can apply (with minimal modifcation) to test drive/POC certain setups. The yaml part of your write up is comparatively small and serves more to illustrate the accompanying text.

@@ -73,3 +73,6 @@ Chart.lock
# build
build/dev*
build/eck*

# macOS specific files
.DS_Store
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use your personal global.gitignore for platform specific files

Comment on lines +8 to +34
podTemplate:
metadata:
labels:
elasticsearch.k8s.elastic.co/cluster-name: cluster-a
spec:
tolerations: # I "accept" the nodes defined for my team
- key: "taints.demo.elastic.co/team"
operator: "Equal"
value: "team-a"
effect: "NoSchedule"
affinity: # I want the nodes defined for my project
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "labels.demo.elastic.co/team"
operator: "In"
values:
- "team-a"
podAntiAffinity: # Try to not put me on the same host where other pods for the same ES cluster are running
preferredDuringSchedulingIgnoredDuringExecution: # or requiredDuring...
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
elasticsearch.k8s.elastic.co/cluster-name: cluster-a
topologyKey: kubernetes.io/hostname
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this is actually ECK specific. The use of taints/tolerations node and pod anti-affinity applies to any workload on Kubernetes.

Comment on lines +52 to +53
- There is no environment in which to test Kubernetes and elastic-operator upgrades, which means each upgrade is going to be fire-and-pray.
- Depending on the implementation, this architecture could become a noisy neighbors party. For instance, a misconfigured development cluster could saturate the underlying host's resources or bandwidth, hence degrading the performance of the pods deployed on the same host.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the informal/colloquial tone here ("fire-and-pray", "noisy neighbors party") fits our reference documentation.

The main two options are:

- Both the production and non-production Monitoring clusters live in a single, separate Kubernetes cluster
- Each Monitoring cluster lives in its own Kubernetes cluster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Each Monitoring cluster lives in its own Kubernetes cluster
- Each monitoring cluster lives in its own Kubernetes cluster


=== Reference architecture #2: one Kubernetes cluster per Elasticsearch deployment

Many Elastic Stack admins opt for having 1:1 mapping between Elasticsearch clusters and Kubernetes clusters, meaning that each Kubernetes cluster is fully dedicated to one single Elasticsearch cluster. This allows for even stronger hard multi-tenancy (assuming this can be considered multi-tenancy) and does not require configurations such as the taints and tolerations, but requires the capability to run a fleet of Kubernetes clusters, which is a task on its own. In other words, in this case customers will intentionally decide to have a fleet of Kubernetes clusters, and will have all the needed automation to manage them. If no such automation is available, it is almost guaranteed that the final outcome will be an impossible-to-manage Kubernetes and Elasticsearch cluster sprawl.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really the case? Do we have data to back that claim up? I am surprised that "many" admins would choose that approach given the operational overhead of running a dedicated k8s cluster. But I could be wrong.


image::prod-and-non-prod-hard.jpeg[Reference architecture #1: production and non-production with hard multi-tenancy,align="center"]

In this architecture, we use Kubernetes namespaces to achieve "soft" multi-tenancy, and pair that with Kubernetes taints, tolerations, and nodeAffinity to ensure "hard" multi-tenancy, so that a node in the Kubernetes cluster will only host pods for Elasticsearch clusters belonging to the same team. This scenario enforces stricter separation of concerns, but comes at a cost: it is in fact highly unlikely that such a deployment would allow for a similar level of resource and cost efficiency, since it probably requires more nodes to be added to the Kubernetes cluster than strictly necessary, likely ending up with some of them being under-utilized.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could argue that this is not really hard multi-tenancy as you would still share the same node pools and the same control plane i.e. effectively a shared database. So to make this multi-tenancy "harder" one would could want dedicated node pools and maybe even virtual clusters per tenant (which however might make things a bit complicated for storage provisioning especially for local storage with Elasticsearch)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to make this multi-tenancy "harder" one would could want dedicated node pools

If you look at the YAML example, we are targeting specific nodes via affinity, taints, and tolerations. In a real-world scenario, those nodes would indeed belong to different nodepools (think dedicated ASGs on AWS). You are right that I should make it clearer :)

@xeraa
Copy link

xeraa commented Jul 16, 2025

I wonder though if a blog post would be a better place for this?

Since I thought "recipe" when seeing it first and then we redirected it: This IMO makes for a pretty dry blog post. Maybe it needs a bit more YAML and actual recipes to be a better fit? I didn't see much in the docs about tolerations for example, which I thought would still be a good addition?

@pebrc
Copy link
Collaborator

pebrc commented Jul 16, 2025

I wonder though if a blog post would be a better place for this?

Since I thought "recipe" when seeing it first and then we redirected it: This IMO makes for a pretty dry blog post. Maybe it needs a bit more YAML and actual recipes to be a better fit? I didn't see much in the docs about tolerations for example, which I thought would still be a good addition?

Yes maybe if we had a complete working example it would make more sense.

@Kushmaro
Copy link

@xeraa , actually, the team is looking to start blogging way more heavily over ECK to push its marketing forward.. I think adding a couple of diagrams to this might make it a good blogpost to start with

@LolloneS
Copy link
Author

LolloneS commented Aug 3, 2025

I'm closing this PR as this is going to be released as a blog post.

@LolloneS LolloneS closed this Aug 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs Documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants