Skip to content

Supporting stretch Kafka cluster with Strimzi #129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 72 commits into
base: main
Choose a base branch
from

Conversation

aswinayyolath
Copy link

@aswinayyolath aswinayyolath commented Sep 5, 2024

This proposal describes design details of stretch cluster

Prototype
A working prototype can be deployed using the steps outlined in a draft README that is being iteratively revised.

Note: The prototype might not always exactly align with this proposal so please refer to the README documentation when working with the prototype.

POC implementation

@aswinayyolath aswinayyolath changed the title Enabling Stretch Kafka Deployments with Strimzi Supporting stretch Kafka cluster with Strimzi Sep 5, 2024
@aswinayyolath aswinayyolath force-pushed the stretch-cluster branch 3 times, most recently from 80f778e to 8396b40 Compare September 6, 2024 10:18
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the proposal. Left some initial comments.

Can you please put one sentence per line to make the review easier? You can look at one of the other proposals for an example.

The word "cluster" is overloaded in this context, so we should always pay attention and clarify if we are talking about Kubernetes or Kafka.

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal. I left some comments.

But TBH, I do not think the level of depth it has is nowhere near to where it would need to be to approve or not approve anything. It is just a super high-level idea that without the implementation details cannot be correct or wrong. We cannot approve some API changes and then try to figure out how to implement the code around it. It needs to go hand in hand.

It also almost completely ignores the networking part which is the most complicated part. It needs to cover how the different mechanisms will be supported and handled as we should be able to integrate into the cloud native landscape and fit in with the tools already being used in this area. Relying purely on something like Ingress is not enough. So the proposal needs to cover how this will be handled and how do we ensure the extensibility of this.

It would be also nice to cover topics such as:

  • How will the installation be handled both on the side clusters as well as on the main Kubernetes cluster
  • Testing strategy (how and where will we test this given our resources)

@aswinayyolath aswinayyolath force-pushed the stretch-cluster branch 5 times, most recently from 72b3605 to 19fac97 Compare September 18, 2024 14:55
aswinayyolath added a commit to aswinayyolath/proposals that referenced this pull request Nov 15, 2024
…tion

Added details about how to use Submariner for cross cluster communication

Contributes to: strimzi#129

Signed-off-by: Aswin A <[email protected]>
aswinayyolath added a commit to aswinayyolath/proposals that referenced this pull request Nov 18, 2024
…tion

Added details about how to use Submariner for cross cluster communication

Contributes to: strimzi#129

Signed-off-by: Aswin A <[email protected]>
mark-VIII pushed a commit to aswinayyolath/proposals that referenced this pull request Feb 14, 2025
…tion

Added details about how to use Submariner for cross cluster communication

Contributes to: strimzi#129

Signed-off-by: Aswin A <[email protected]>
Signed-off-by: Mark S Taylor <[email protected]>
mark-VIII pushed a commit to aswinayyolath/proposals that referenced this pull request Feb 14, 2025
…tion

Added details about how to use Submariner for cross cluster communication

Contributes to: strimzi#129

Signed-off-by: Aswin A <[email protected]>
aswinayyolath and others added 4 commits February 14, 2025 11:02
Moved sentences to separate lines to help with reviews

Signed-off-by: Aswin A <[email protected]>
Signed-off-by: Mark S Taylor <[email protected]>
@mark-VIII
Copy link

mark-VIII commented Feb 17, 2025

This stretch cluster proposal has been updated significantly to include details of a prototype. We'd like to request a re-review of the proposal, please.

Many thanks!

@scholzj
Copy link
Member

scholzj commented May 6, 2025

Would you be able to share details of the projects that you do see demand for?

I know of user interest in Istio, load balancers and node ports, Submariner, and Skupper. To be clear, I'm not talking with them daily, so no idea which one is still valid and who moved to something else etc. So these are things where interest was mentioned in the past that I can still remember.

I'm not aware of any interest in Cilium (well, other than you I guess). But all of these technologies have one thing in common. The interest in stretch cluster is very small among them. Basically single-digit number of users.

And there are plenty of other technologies that can be used and that Strimzi users use. For example ... Linkerd, Calico, Ingress, Gateway API, and so on. Any of those users might want to stretch a Kafka cluster tomorrow. And it will be hard to reject it, because suddenly it will have pretty much the same demand level as the previous implementations. That is why I think it is important to have the plugability and only basic support in Strimzi and have all the various niche implementations live outside.

…afka clusters

Update proposal to clarify that .cluster.local is not used in
advertised.listeners and that .svc is sufficient for intra-cluster DNS resolution.

Signed-off-by: Aswin A <[email protected]>
@sunleym
Copy link

sunleym commented May 7, 2025

Just to add that, from a business point of view, those who are using Strimzi as part of IBM Event Streams have been asking us for an open solution for stretch clusters due to the resulting simplification for their applications and operations. Many customers have deployed proprietary implementations (making it harder for us to displace with Strimzi as an open implementation) and other customers are in the process of building their own custom solutions. This is the reason for the urgency / engagement from Ashwin, Mark and others on this topic.

Matt Sunley, Principal Product Manager for IBM Event Automation

@aswinayyolath
Copy link
Author

I really appreciate all the valuable feedback maintainers have provided so far. It's helped sharpen the thinking around how to structure this proposal responsibly and in a way that fits within Strimzi's long-term maintainability model.

Let me share some context and thoughts on why I believe we may not need to introduce a formal pluggable interface at this stage, while still keeping the door open for future extensibility.

We started with a clean slate and tested various technologies: Submariner, Istio, Skupper, and later, Cilium. Only Submariner and Cilium worked for this particular use case, and what stood out was both rely on the Kubernetes Multi-Cluster Services (MCS) API. That realization shaped our thinking. We weren't building something specific to Cilium or Submariner - rather, we found that the MCS API acts as a unifying foundation across implementations. It allows services to be exported using ServiceExport, and makes them accessible through standardized *.clusterset.local DNS names. So, instead of baking technology specific logic into the Operator, we're simply consuming K8s native abstractions, which gives us portability and neutrality by design. The operator doesn't need to care whether it's Cilium, Submariner, or other tech behind the scenes --as long as it follows MCS, it just work.

What's interesting is that Kubernetes itself becomes the pluggable interface in this model. Users can choose how to expose Kafka brokers and controllers via MCS compliant solutions like Submariner, Cilium etc. The operator doesn't need to be aware of the network implementation --- only that it can resolve hostnames and connect to remote Pods. This makes it very easy for users to try stretch clusters, using whatever networking solution they prefer, without pushing any specific technology into Strimzi core.

We're absolutely not against having a pluggable model down the road. If adoption grows and the need for tech specific handling becomes clearer, we'll have a solid foundation to build one.

But at this point, introducing an abstraction layer for plugins feels premature — and possibly even counterproductive because

  1. There is a clear path using MCS without needing plugins.
  2. We'd be speculating about what the interface should look like before actual demand materializes.
  3. It risks slowing down the rollout of a practical and working solution.

We'd rather validate the concept with a lean, Kubernetes-aligned implementation first. If multiple users or stacks eventually require custom behavior, we can revisit pluggability based on real-world needs and usage patterns.

@aswinayyolath
Copy link
Author

Aside from the main issues I have with this proposal which still stand, I left some comments about what is IMHO wrong in this proposal as it is.

Also, it seems to be completely missing the following aspects:

  • Upgrades and versioning -> Both for Kafka itself, but especially for the operator which now runs in multiple separate installations.
  • Strategies for deploying the other apps such as Connect, Bridge, MM2, etc.
  • The process of migration of regular cluster to stretch cluster and vice versa
  • There are exactly zero details about how this will be actually implemented in the code. This should absolutely be covered in the proposal.

We’re already aware of the aspects related to upgrades and versioning as well as the process of migrating a regular cluster to a stretch cluster, and we’re actively working on them. We’ll be including detailed information on these topics in the proposal very soon.

@scholzj
Copy link
Member

scholzj commented May 7, 2025

@sunleym @aswinayyolath please see my responses below ...

Just to add that, from a business point of view, those who are using Strimzi as part of IBM Event Streams have been asking us for an open solution for stretch clusters due to the resulting simplification for their applications and operations. Many customers have deployed proprietary implementations (making it harder for us to displace with Strimzi as an open implementation) and other customers are in the process of building their own custom solutions. This is the reason for the urgency / engagement from Ashwin, Mark and others on this topic.

Honestly, I'm not sure what exact Confluent feature you are talking about. Cluster Linking? I don't think that is comparable to stretch clusters over multiple Kube clusters. So, probably some other feature? I also wonder where exactly you see the resulting simplification for their applications and operations, or what exactly do you compare it to. This adds tons of complexity, costs, and operational effort in all involved layers while maybe marginally increasing the availability.

But at least in my personal view, Strimzi's main goal is not to have feature parity with Confluent. We do not have the resources for that. (But we have clearly other advantages) And as such, we have our own goals with much higher priority, such as project sustainability and stability.

And as you opened the vendor perspective, I think what I suggested fits there very well. Having a proper pluggable mechanism will enable all vendors providing software to decide which plugins they will support based on which technologies or provide their own - open source or proprietary plugins - based on the custom demand. As a vendor, the first thing you want to avoid is having to support some networking stack that you do not want to just because it is baked-in in the upstream project.

We started with a clean slate and tested various technologies: Submariner, Istio, Skupper, and later, Cilium. Only Submariner and Cilium worked for this particular use case, and what stood out was both rely on the Kubernetes Multi-Cluster Services (MCS) API. That realization shaped our thinking. We weren't building something specific to Cilium or Submariner - rather, we found that the MCS API acts as a unifying foundation across implementations. It allows services to be exported using ServiceExport, and makes them accessible through standardized *.clusterset.local DNS names. So, instead of baking technology specific logic into the Operator, we're simply consuming K8s native abstractions, which gives us portability and neutrality by design. The operator doesn't need to care whether it's Cilium, Submariner, or other tech behind the scenes --as long as it follows MCS, it just work.

To be honest, I do not know what was you path to get where we are, but until I raised the comments about it, this proposal was written purely around Cilium and not around the MCS API.

But as far as my latest understanding is, the MCS API has two useful implementations - Cilium and Submariner. And we have heard about a real interest only for Submariner, because as far as I understood it now there is no particular interest in Cilium from IBM. Don't get me wrong, it would be great if one day there would be one API covering all the various projects we see users interested in. But it is not there, and I do not think I would want to go all in on the MCS API and bake it into the Strimzi code-base as the only option. So, as far as I'm concerned, having one of the available plugins designed to support the MCS API would be great. But that is it.

What's interesting is that Kubernetes itself becomes the pluggable interface in this model. Users can choose how to expose Kafka brokers and controllers via MCS compliant solutions like Submariner, Cilium etc. The operator doesn't need to be aware of the network implementation --- only that it can resolve hostnames and connect to remote Pods. This makes it very easy for users to try stretch clusters, using whatever networking solution they prefer, without pushing any specific technology into Strimzi core.

Yes, Kubernetes is becoming more and more plugable for a long time. And it is one of the inspirations for why I think we should make it plugable here as well. See for example, the development around cloud provider plugins, container runtime plugins, storage driver plugins, etc. They all follow more or less what I'm suggesting here.

We're absolutely not against having a pluggable model down the road. If adoption grows and the need for tech specific handling becomes clearer, we'll have a solid foundation to build one.

But at this point, introducing an abstraction layer for plugins feels premature — and possibly even counterproductive because

  1. There is a clear path using MCS without needing plugins.
  2. We'd be speculating about what the interface should look like before actual demand materializes.
  3. It risks slowing down the rollout of a practical and working solution.

We'd rather validate the concept with a lean, Kubernetes-aligned implementation first. If multiple users or stacks eventually require custom behavior, we can revisit pluggability based on real-world needs and usage patterns.

This is the right motivation, but applied in exactly the wrong direction. You provide the concept outside of the core code base and validate it. And if it turns out that half of Strimzi users are using the MCS API plugin and it is hugely popular, we can revisit it.

Having the pluggable interface in Strimzi and the plugins live outside the Strimzi code-base will set clear expectations to Strimzi users:

  • About its support level
  • About who is standing behind it
  • About the size of the code and the effort possibly needed to self-maintain it if it is not supported

As a Strimzi maintainer, I absolutely want to avoid the situation where Strimzi includes some code to validate and idea ... lures the users to start using it ... and then after six months drops it because the validation failed.

That is something Feature Gate does not protect against. Because if you in the feature gate clearly state that this might be removed in the future if it fails the validation, most people will wait for it to be finished before using it. So while you might avoid luring someone into a feature you will remove after 6 months, it will ultimately also set it up to fail the validation. And all of that of course while having a bunch of code in the code base that has to be removed and lost of asociated costs and effort already burnt through.

@aswinayyolath
Copy link
Author

To be honest, I do not know what was you path to get where we are, but until I raised the comments about it, this proposal was written purely around Cilium and not around the MCS API.

Thanks for pointing that out, Just to share the background, after the initial PoC, the revised proposal did include both Submariner and Cilium as viable options for enabling stretch clusters. During community discussions, we agreed to narrow the initial implementation scope and focus on one technology to keep things manageable. At that point, Cilium was selected due to some observed performance advantages in our testing. That said, we always intended to keep the design open to supporting other technologies in the future.

Copy link
Member

@ppatierno ppatierno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another pass and left comments (I have to go through the discussion on the main page which caused more changes on the proposal I guess)


##### Remote cluster operator configuration

When deploying the operator to remote clusters, the operator must be configured to reconcile only StrimziPodSet resources by setting the existing environment variable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the STRIMZI_POD_SET_RECONCILIATION_ONLY could be not used.

AFAICS the combination of the strimzi.io/enable-stretch-cluster annotation on the Kafka custom resource and the strimzi.io/remote-podset annotation on the StrimziPodSet should be enough to avoid collisions.

For example ...

  • If the user creates a Kafka CR named foo in the central cluster (with strimzi.io/enable-stretch-cluster annotation), we'll have StrimziPodSet (with remote-podset annotation) landing on the remote cluster. This would help the remote cluster operator (strimzipodset controller) to reconcile it (together with all the "local" StrimziPodSet).
  • At the same time, the user can create a Kafka CR named foo (again!) in the local cluster as well: it doesn't have strimzi.io/enable-stretch-cluster annotation, so it's local, the cluster operator creates the StrimziPodSet (without strimzi.io/remote-podset annotation) and it's able to reconcile it.

I guess that having two Kafka CR with same name (as stretched cluster and local cluster) won't be a problem in terms of advertising addresses and quorum voters because the stretched ones will take clusterId into account at DNS names level.

This way the cluster operator on the remote cluster can still operate the other operands (bridge, connect, ...).

But at this point shouldn't we have a similar annotation to remote-podset for all the other resources (listed later in the proposal) like ConfigMap, Secret and so on to avoid clashing with the same but corresponding to the local Kafka cluster having the same name as the stretched one?

Comment on lines +336 to +337
- The feature gate will be disabled by default, allowing early adopters and community members to safely test the functionality without affecting production environments.
- After at least two Strimzi releases, and based on user feedback and observed stability, enabling the feature gate by default may be considered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on having this behind a feature gate despite to enable a stretch cluster needs some steps and configuration. A FG makes clear to the users that's a beta feature to be tested before maturing. Of course, we would need a timeline, I agree.

Comment on lines 350 to 353
### Kafka Connect, Kafka Bridge and MirrorMaker2
This proposal does not cover stretching Kafka Connect, Kafka MirrorMaker 2 or the Kafka Bridge.
These components will be deployed to the central cluster and will function as they do today.
Operators running in remote clusters will not manage KafkaBridge, KafkaConnect, KafkaConnector, or KafkaMirrorMaker2 resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the same thread I left a comment about the possibility to avoid the STRIMZI_POD_SET_RECONCILIATION_ONLY and having remote cluster operator operating all other components deployed locally. For the stretch cluster it would make sense to continue to deploy in the central one imho.

aswinayyolath and others added 6 commits May 8, 2025 18:28
Co-authored-by: Paolo Patierno <[email protected]>
Signed-off-by: Aswin A <[email protected]>
Co-authored-by: Paolo Patierno <[email protected]>
Signed-off-by: Aswin A <[email protected]>
Co-authored-by: Paolo Patierno <[email protected]>
Signed-off-by: Aswin A <[email protected]>
Co-authored-by: Paolo Patierno <[email protected]>
Signed-off-by: Aswin A <[email protected]>
Co-authored-by: Paolo Patierno <[email protected]>
Signed-off-by: Aswin A <[email protected]>
@aswinayyolath
Copy link
Author

aswinayyolath commented May 8, 2025

Thanks for raising this, @ppatierno — you're right, the external bootstrap section definitely deserve a more detailed explanation. I'm currently working on refining that part of the proposal, and Jakub has also shared some valuable questions in this area.

I do have a working solution in mind, and I've tested how it behaves in practice, but I want to take a bit more time to properly evaluate the trade-offs and ensure we're proposing the most robust and user-friendly approach. I'll update the proposal soon with a more complete explanation.

@ppatierno
Copy link
Member

As a more general comment on the direction of the proposal ...

During the review, in the past weeks, I raised the plugability issue with @aswinayyolath. It was mostly related to the fact that the proposal had a distinction between using Cilium (which didn't need any operator code change) vs Submariner (which needed the creation of a ServiceExport resource) so I was suggesting to defer the additional needs for the underlying network infra to a plugin outside of Strimzi (instead of having dedicated code changes within the operator).
The proposal has changed since then and what I see is that the MCS API provides a good abstraction so that, despite the operator has to know about the remote clusters for the resources creation, the overall Kafka cluster seems to work out of the box (in terms of pod-to-pod communication, exposure to clients and so on).

I think that leveraging a Kubernetes API to abstract the underneath networking would be the best choice, while on the other side I could also see that the documentation is poor (some sections are empty or TBD) which, to be honest, set doubts on how much the community is still working on it. There are also a few projects implementing it.

On the other side I am not sure I get what @scholzj is mentioning as plugability in order to use projects like "Linkerd, Calico, Ingress, Gateway API, and so on" (as mentioned in one of the comments). Of course, they don't implement the MCS API (or it's just my ignorance here) so I would assume what Jakub is referring to is a kind of "Strimzi API for stretched cluster" that someone has to implement via a plugin in order to use their preferred technology. Is my understanding right?

But also, if we think this feature has not big demand aren't we sure we can't be more opinionated and supporting MCS API with the few implementations available? A user who wants a stretch cluster should use one of them. I know it could be not possible for various business reasons, but at the same time do we want to really think about a generic new API in order to provide several users to implement their own and then discovering after months no one is going to do so?
If we leverage the MCS we can still discover that stretch cluster feature will be used by just one user but at least I don't see a big impact on the operator code if all the networking abstraction stuff is totally outside of it.

Also I am not sure from where the comparison with Confluent is coming but I agree with Jakub that the scope and goal of Strimzi are different.

I can understand that discussing a proposal for so long time could be frustrating as I can feel from you when reading about "urgency / engagement ..." for customers that IBM has on its side but, as an open source maintainer, I take care of the project on the long term so thinking through helps (I hope that IBM folks can confirm that the proposal has been improved a lot with all the feedback from the community). Even if it means delaying stuff at the beginning, it pays on the long run.

@sunleym
Copy link

sunleym commented May 8, 2025

Thanks Jakub and Paulo. I do appreciate all the technical collaboration between the IBM and RedHat folks on this topic, and that it's worth taking time to start down the right path. At the same time, we see users of Kafka in Kubernetes with requirements for stretch clusters - it comes up a lot - and they are adopting/already using proprietary solutions (whether or not they really provide a similar capability, perception is all that matters) or even rolling their own in some cases. Many of these users would adopt Strimzi and that's what I want to see.

Co-authored-by: Paolo Patierno <[email protected]>
Signed-off-by: Aswin A <[email protected]>
@scholzj
Copy link
Member

scholzj commented May 10, 2025

On the other side I am not sure I get what @scholzj is mentioning as plugability in order to use projects like "Linkerd, Calico, Ingress, Gateway API, and so on" (as mentioned in one of the comments). Of course, they don't implement the MCS API (or it's just my ignorance here) so I would assume what Jakub is referring to is a kind of "Strimzi API for stretched cluster" that someone has to implement via a plugin in order to use their preferred technology. Is my understanding right?

@ppatierno The Strimzi API and plugins I'm talking about are set of Java interfaces as the API and a JAR with the implementation of the interfaces as the plugin. I.e. as our PodSecurityProviders or Kafka connectors for example.

@ppatierno
Copy link
Member

@ppatierno The Strimzi API and plugins I'm talking about are set of Java interfaces as the API and a JAR with the implementation of the interfaces as the plugin. I.e. as our PodSecurityProviders or Kafka connectors for example.

Well this looks more the technical explanation about what to do which was pretty clear to me. My question was more about ... are you envisage a custom Strimzi API for plugins (so something we should have in the proposal which doesn't exist at all) and not using a Kubernetes API like the MCS one? And my next question was, why you don't see enough having the MCS API (without the pluggability you are requesting)? Because of only a few projects implementing MCS?

@scholzj
Copy link
Member

scholzj commented May 10, 2025

@ppatierno The Strimzi API and plugins I'm talking about are set of Java interfaces as the API and a JAR with the implementation of the interfaces as the plugin. I.e. as our PodSecurityProviders or Kafka connectors for example.

Well this looks more the technical explanation about what to do which was pretty clear to me. My question was more about ... are you envisage a custom Strimzi API for plugins (so something we should have in the proposal which doesn't exist at all) and not using a Kubernetes API like the MCS one? And my next question was, why you don't see enough having the MCS API (without the pluggability you are requesting)? Because of only a few projects implementing MCS?

TBH, I think the advantages of having a pluggable interface are pretty obvious. And I think I covered many of them in my comments already anyway. For example:

  • You can have many different implementations
  • The plugins inherently capture the actual interest from the users (is there a plugin to stretch the cluster using X? How and by whom is it maintained? How many starts/pulls does it have? etc.)
  • The different implementations and the option to write their own give the users the choice to use whatever technology they want. The technologies being talked about here tend to be all or nothing. You do not want to use a completely different tech stack for each application. And you cannot expect that Strimzi has enough gravity to convince someone to move the whole company from tech stack X to tech stack Y.
  • They help to reduce the strain on the core maintainer team - the plugins can live outside, have their own governance, testing, release, etc.
  • Vendors can pick up what to support depending on their customers needs.

The MCS API ... I would expect it to be one of the possible implementations. But TBH, for me it is not a Kubernetes API - at least not yet. It is a project worked on by one of the Kubernetes SIGs. I would be happy if it one day helps to standardize things. But while I do not claim to be an expert on it, it does not seem to be there. The obvious question marks are the number of implementations and the maturity of the API (4-year-old alpha version?). Why do you think it is the only thing we need to support?

So no, for me it is not the obvious choice to hardcode it and dump it on the Strimzi community. And if you wanna build this using Kubernetes APIs, the obvious choice would be load balancers, node ports, etc. - but even there I would vote for the pluggability over having it hardcoded.

Designing the pluggable interface might be initially more complicated. But if you are in it for the long term, I'm 100% sure it is worth it. As a core community, we also reduce some effort on developing and maintaining the various implementations and on testing them. It will also lead to cleaner design, as you cannot just hardcode all the stuff into the codebase but have to think about it a bit more.

If you are against the plugability, I would also be curious what you would do if someone comes next month with the proposal to hardcode something else next to it? I do not think you would have other choice than to accept it. The plugability I'm proposing gives you a clear path for everyone without dumping the burden on the core community.

…cluster

and validate KafkaNodePool deployment targets in stretch cluster setups

Signed-off-by: Aswin A <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants