Introducing new KafkaRoller #103

tinaselenge · 2024-01-02T13:33:26Z

POC implementation

Signed-off-by: Gantigmaa Selenge <[email protected]>

06x-new-kafka-roller.md

Made some improvements on the structure Signed-off-by: Gantigmaa Selenge <[email protected]>

fvaleri

Just a first pass, as I need more time to digest this. I think it would be useful to illustrate the new behavior with a couple of examples of the form: with this roller configuration and cluster state, these are the node groups and their restart order. Wdyt?

06x-new-kafka-roller.md

Tidy up Signed-off-by: Gantigmaa Selenge <[email protected]>

Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge · 2024-04-23T15:20:22Z

@fvaleri Thank you for the feedback. I have added an example of rolling update. Please let me know what you think.

see-quick

Nice proposal. Thanks for it 👍 .

STs POV:

I think we would need to also design multiple tests to cover all states, which KafkaRoller v2. We have a few tests but for sure that's not 100% coverage. So, we should maybe have a meeting to talk about this...

Side note about performance:

What would be appropriate performance metrics for us to consider when designing performance tests? Are there any critical ones? For sure I can image that we would see significant improvement on RollingUpdates of multiple nodes when we use batching mechanism...

06x-new-kafka-roller.md

Co-authored-by: Maros Orsak <[email protected]> Signed-off-by: Gantigmaa Selenge <[email protected]>

fvaleri

@tinaselenge thanks for the example, it really helps.

I left some comments, let me know if something is not clear or you want to discuss further.

06x-new-kafka-roller.md

fvaleri · 2024-04-25T15:15:17Z

06x-new-kafka-roller.md

+   - Cruise Control sends `removingReplicas` request to un-assign the partition from broker 2.
+   - KafkaRoller is performing a rolling update to the cluster. It checks the availability impact for foo-0 partition before rolling broker 1. Since partition foo-0 has ISR [1, 2, 4], KafkaRoller decides that it is safe to restart broker 1. It is unaware of the `removingReplicas` request that is about to be processed.
+   - The reassignment request is processed and foo-0 partition now has ISR [1, 4].
+   - KafkaRoller restarts broker 1 and foo-0 partition now has ISR [4] which is below the configured minimum in sync replica of 2 resulting in producers with acks-all no longer being able to produce to this partition.


In addition to rebalance, we have the same race condition with replication factor change (the new integration between CC and TO), maybe you can mention this.

The roller should be able to call the CC's user_tasks endpoint, and check if there is any pending task. In that case, the roller has two options: wait for all tasks completion, or continue as today with the potential issue you describe here. You can't really stop the tasks because the current batch will still be completed, and the operators will try to submit a new task in the next reconciliation loop.

I think that we should let the user decide which policy to apply through a configuration. By default the roller would wait for all CC tasks to complete, logging a warning. If the user set or switch to "force" policy, then the roller would behave like today. Wdyt?

Should this be perhaps included/discussed in a separate proposal or issue? The idea was to mention that there is a race condition we could fix with the new roller in the future, which is not easy to fix with the old roller. How we fix it and other similar problems should be a separate discussion I think.

This should have a dedicated proposal IMO, but let's start by logging an issue.

Would calling the ListReassigningPartitions API be enough to know this?

06x-new-kafka-roller.md

katheris

Overall this looks good to me, but I had a few questions and wording suggestions. I definitely think this will be useful since I've experienced first hand how tricky it is to debug the existing code.

06x-new-kafka-roller.md

Add possible transitions Signed-off-by: Gantigmaa Selenge <[email protected]>

Signed-off-by: Gantigmaa Selenge <[email protected]>

fvaleri

Thanks for answering all my questions. Good job.

tinaselenge · 2024-07-26T13:59:25Z

Thanks for answering all my questions. Good job.

Thank you @fvaleri , I really appreciate you reviewing the proposal thoroughly.

tombentley · 2024-07-31T03:04:05Z

06x-new-kafka-roller.md

+- Although it is safe and straightforward to restart one broker at a time, this process is slow in large clusters ([related issue](https://github.com/strimzi/strimzi-kafka-operator/issues/8547)).
+- It does not account for partition preferred leadership. As a result, there may be more leadership changes than necessary during a rolling restart, consequently impacting tail latency.
+- Hard to reason about when things go wrong. The code is complex to understand and it's not easy to determine why a pod was restarted from logs that tend to be noisy.
+- Potential race condition between Cruise Control rebalance and KafkaRoller that could cause partitions under minimum in sync replica. This issue is described in more detail in the `Future Improvements` section.


In general Slack is not really ideal for keeping details of problems in the long term. Better to create an issue, which can be discovered more easily by anyone who faces a similar problem.

06x-new-kafka-roller.md

Updated the text on the diagram Signed-off-by: Gantigmaa Selenge <[email protected]>

Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge · 2024-09-02T12:29:03Z

Hi @tombentley @scholzj @ppatierno, do you have any further comments on this proposal?

tinaselenge · 2024-12-19T13:21:40Z

Converting this PR to Draft, as I'm currently rewriting the proposal to make it more focused on the proposed solutions, rather than implementation details. However, I'm not changing the core of the proposal. I'm hoping that the upcoming update will make it easier to review.

rotem-human · 2025-03-25T19:37:17Z

Any updates on this?

Removed the implementation details such as the algorithm. This will be included in the draft PR for the POC instead. Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge · 2025-04-15T23:22:29Z

Hi @strimzi/maintainers. I have made some updates to the proposal recently, to hopefully make it easier to review. The proposal should be now focusing on the design only and all the implementation details are in the linked POC PR. You can build and run the POC locally, to see if how it works.

Before I invest more time and effort into this, I would like to get a clear decision on the next steps for this proposal since there have been some mixed opinions.

Do you see this proposal being accepted and implemented now or in the future? Or should we close this proposal for now?
Is there a maintainer/s who would be willing to support this proposal through the implementation if it does get accepted?

I also have a question to ask the community users and other contributors:
If we were to go ahead with this proposal, is there community users and contributors who are willing to help with the implementation part or able to test this feature in their environments while it's still behind a feature gate. I think testing it in their environment and providing feedback would be the most important contribution for this feature. This also may weigh into the maintainer's decision as well. So please let us know if you are interested in or need this feature for your use case.

katheris · 2025-04-17T09:56:57Z

Do you see this proposal being accepted and implemented now or in the future? Or should we close this proposal for now?

Is there a maintainer/s who would be willing to support this proposal through the implementation if it does get accepted?

My stance on this hasn't changed. I believe this is a useful feature, both in terms of the benefits to users around batching the rolling updates for large clusters, and in terms of improving the KafkaRoller code to make it easier to understand. I don't personally have the time to work on the implementation of this, but am willing to put myself forward as a maintainer who will prioritise reviewing the changes and understanding the code for future maintenance once it's in Strimzi.

I'm also interested to hear what other maintainers and users think about the usefulness of this feature, so let's also add it to the Strimzi community call agenda late today (17th April).

ppatierno · 2025-05-06T08:17:51Z

After three weeks from the community call where we discussed this, there weren't any additional reviews or comments here (apart from the Kate's one, but before the community call).
I had a quick look at the proposal but mostly the first part describing known issues affecting the current Kafka roller.
My current feeling is:

despite some known issues, I can't see more and more community users complaining about them. Are they really issues or are we overthinking here?
the additional features it can bring (rolling brokers in batches, slowing down rolling update) are still not pushed more from the community. So even in this case, how much are they important/requested?

The two above will be the big benefits we are getting by using a new Kafka Roller but without a real need, I have the concerns about testing and have it working to win against the advantages we can take.
From my pov, we should have community users showing interest in it. For example, the referenced issue #8547 was started by @yyang48 and it would be interesting to know if there is still interest and willing to help with it.

Said that, from the community call it seems that we have got:

@tinaselenge to work on the implementation
@katheris as the core maintainer willing to help and review the changes
@im-konge helping with testing it

im-konge

Thanks a lot for the proposal, I went through it and it LGTM (I left some minor comments). I would like to go through the PoC and possibly try it to see how it works. In terms of testing, we should have a look on writing more STs now, as IIRC there are just few for the old roller and it would be good to have them improved. Also we discussed with @see-quick that it would be maybe beneficial to look at the performance tests in the future (but it's not something urgent right now).

If the current code is too complex and the new KafkaRoller would make the code clearer + other code changes or new features would be much simpler to implement, it's good to have it there FMPOV.

Also, you mentioned various things that will be implemented -> from my understanding you expect and want to have them all in place with the initial implementation, right? Or you want to do 1:1 KafkaRoller in comparison with the old one and then add the new features?

im-konge · 2025-05-15T14:10:50Z

06x-new-kafka-roller.md

+ | - | - | UNKNOWN
+ | Pod is not Running | - | NOT_RUNNING 
+ | Pod is Running but lacking Ready status | Broker state != 2 | NOT_READY 
+ | Pod is Running but lacking Ready stats | Broker state == 2 | RECOVERING 


im-konge · 2025-05-15T14:41:45Z

06x-new-kafka-roller.md

+The new KafkaRoller introduced by this proposal will used only for KRaft based clusters. 
+This proposal should have no impact on any existing Kafka clusters deployed with ZooKeeper. 


Because Strimzi doesn't support Zookeeper anymore, are these two lines needed? Just asking out of curiosity, as I know that the proposal was written in the moment when we supported both.

im-konge · 2025-05-15T14:42:45Z

06x-new-kafka-roller.md

+
+| Phase | Strimzi versions       | Default state                                          |
+|:------|:-----------------------|:-------------------------------------------------------|
+| Alpha | 0.46, ?                | Disabled by default                                    |


We should change it to 0.47/0.48 or something. It's just a reminder.

tinaselenge force-pushed the kafka-roller-2 branch from 8c79a95 to 9c6154b Compare January 2, 2024 13:52

Intoducing the new KafkaRoller that only supports KRaft mode

c74f0b4

Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge force-pushed the kafka-roller-2 branch from 9c6154b to c74f0b4 Compare January 2, 2024 16:04

Add explanation for retrying the node

5abafe6

Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge force-pushed the kafka-roller-2 branch from ca68601 to 5abafe6 Compare February 19, 2024 18:03

tombentley reviewed Feb 20, 2024

View reviewed changes

tinaselenge force-pushed the kafka-roller-2 branch from 56d7a24 to 4baf73a Compare February 21, 2024 12:21

Address review comments

33ec40e

Made some improvements on the structure Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge force-pushed the kafka-roller-2 branch from 4baf73a to 33ec40e Compare February 21, 2024 13:30

fvaleri reviewed Feb 24, 2024

View reviewed changes

tinaselenge force-pushed the kafka-roller-2 branch 2 times, most recently from c060c24 to 433316f Compare March 15, 2024 12:11

Address comments from Federico

4f91a5a

Tidy up Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge force-pushed the kafka-roller-2 branch from 433316f to 4f91a5a Compare March 15, 2024 12:12

tinaselenge marked this pull request as ready for review March 15, 2024 12:29

Add more description on how unready nodes are handled.

941fe43

Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge force-pushed the kafka-roller-2 branch from 97bdef2 to 941fe43 Compare April 3, 2024 15:48

Add an example of rolling restart

1147134

Signed-off-by: Gantigmaa Selenge <[email protected]>

see-quick reviewed Apr 25, 2024

View reviewed changes

Update 06x-new-kafka-roller.md

9a777c3

Co-authored-by: Maros Orsak <[email protected]> Signed-off-by: Gantigmaa Selenge <[email protected]>

fvaleri reviewed Apr 25, 2024

View reviewed changes

katheris reviewed Apr 25, 2024

View reviewed changes

tinaselenge commented Apr 30, 2024

View reviewed changes

06x-new-kafka-roller.md Show resolved Hide resolved

tinaselenge force-pushed the kafka-roller-2 branch 3 times, most recently from 931adbd to 1060fee Compare April 30, 2024 13:58

katheris reviewed Apr 30, 2024

View reviewed changes

06x-new-kafka-roller.md Outdated Show resolved Hide resolved

Address review comments

e56d1f8

Add possible transitions Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge force-pushed the kafka-roller-2 branch from 1060fee to e56d1f8 Compare May 3, 2024 10:28

tinaselenge force-pushed the kafka-roller-2 branch 2 times, most recently from 4c035fa to bf71ae6 Compare July 19, 2024 08:30

Update the state diagram

660f2ac

Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge force-pushed the kafka-roller-2 branch from bf71ae6 to 660f2ac Compare July 22, 2024 16:48

fvaleri approved these changes Jul 26, 2024

View reviewed changes

tombentley reviewed Jul 31, 2024

View reviewed changes

Address comments from Tom

e9a9859

Updated the text on the diagram Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge force-pushed the kafka-roller-2 branch from ad75fcf to e9a9859 Compare August 1, 2024 13:09

Tidy up

6ab8400

Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge marked this pull request as draft December 19, 2024 13:18

tinaselenge force-pushed the kafka-roller-2 branch 3 times, most recently from e29b13a to 9faf987 Compare December 19, 2024 18:22

tinaselenge force-pushed the kafka-roller-2 branch from 9faf987 to 1d97d60 Compare January 8, 2025 09:36

tinaselenge mentioned this pull request Jan 8, 2025

Proof of concept implementation for KafkaRoller 2.0 strimzi/strimzi-kafka-operator#11020

Draft

8 tasks

tinaselenge force-pushed the kafka-roller-2 branch from 1d97d60 to e43e667 Compare January 8, 2025 11:38

Rewrite of the proposal section.

e9d647d

Removed the implementation details such as the algorithm. This will be included in the draft PR for the POC instead. Signed-off-by: Gantigmaa Selenge <[email protected]>

tinaselenge force-pushed the kafka-roller-2 branch from e43e667 to e9d647d Compare April 14, 2025 12:05

tinaselenge requested review from katheris, fvaleri and ppatierno April 15, 2025 12:21

tinaselenge marked this pull request as ready for review April 17, 2025 10:14

im-konge reviewed May 15, 2025

View reviewed changes

	\| Pod is Running but lacking Ready stats \| Broker state == 2 \| RECOVERING
	\| Pod is Running but lacking Ready status \| Broker state == 2 \| RECOVERING

		The new KafkaRoller introduced by this proposal will used only for KRaft based clusters.
		This proposal should have no impact on any existing Kafka clusters deployed with ZooKeeper.

Introducing new KafkaRoller #103

Are you sure you want to change the base?

Introducing new KafkaRoller #103

Uh oh!

Conversation

tinaselenge commented Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fvaleri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tinaselenge commented Apr 23, 2024

Uh oh!

see-quick left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fvaleri left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fvaleri Apr 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tinaselenge Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

fvaleri May 7, 2024

Choose a reason for hiding this comment

Uh oh!

stanislavkozlovski Sep 14, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

katheris left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tinaselenge commented Jan 2, 2024 •

edited

Loading

see-quick left a comment •

edited

Loading

fvaleri left a comment •

edited

Loading

fvaleri Apr 25, 2024 •

edited

Loading

ppatierno commented May 6, 2025 •

edited

Loading

im-konge left a comment •

edited

Loading