Investigate support for distributed kafka connect #109

yatharthranjan · 2021-08-03T15:26:27Z

The Kafka connectors (eg Fitbit) are currently deployed in standalone mode, to take full advantage of the scalability of Kubernetes these can be deployed in distributed mode (in which the connectors themselves are stateless and store state in kafka). At KCL we have some use cases where running in distributed mode is necessary.

blootsvoets · 2021-08-23T08:46:26Z

This needs to be mulled over a bit. Pro's and cons as I see them, using the current docker files:

- when in distributed mode, connectors must be provisioned twice: one time starting up the connector engine, one time by sending the JSON message to start up the connector.
- with the current docker images, we can't just use a generic "kafka connector" chart, because it won't include all the connector plugins we use (s3, upload, jdbc, fitbit). So the JSON message can only be sent to the appropriate Kafka connector pod.
- liveness of the connector becomes harder to compute, since the probe does not know if the connector plugin has crashed or if the connector plugin has not yet been started or has been actively stopped.
+ use multiple nodes for processing.

I can see this as an alternate run mode that could be added to the respective charts. The operator would then be in charge of sending the appropriate JSON file every time a pod restarts. This looks very error-prone to me.

Alternatively, each relevant Dockerfile is adapted to use distributed mode. The entry point script then polls whether the distributed connector engine has started and send a JSON file from a predefined path. If you could make the relevant change to the connector you want to run in distributed mode, we could adapt the helm chart to cater for this change.

keyvaann · 2021-08-23T09:41:51Z

If we alter the Dockerfile to use distributed mode can we still have them running with only a single instance?

blootsvoets · 2021-08-23T10:11:49Z

If it automatically starts up the actual component during startup, from K8S of view, it would be the same. The only difference is that it will then store offsets in Kafka instead of in a persistent volume.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate support for distributed kafka connect #109

Investigate support for distributed kafka connect #109

yatharthranjan commented Aug 3, 2021

blootsvoets commented Aug 23, 2021 •

edited

Loading

keyvaann commented Aug 23, 2021

blootsvoets commented Aug 23, 2021

Investigate support for distributed kafka connect #109

Investigate support for distributed kafka connect #109

Comments

yatharthranjan commented Aug 3, 2021

blootsvoets commented Aug 23, 2021 • edited Loading

keyvaann commented Aug 23, 2021

blootsvoets commented Aug 23, 2021

blootsvoets commented Aug 23, 2021 •

edited

Loading