Skip to content

Roadmap for Xinfra Monitor

Andrew Choi edited this page Jun 18, 2020 · 6 revisions

Here are a few things that we plan to work on to make Xinfra Monitor more useful.

Monitoring Effects of Metadata Changes within the Cluster

For example, how long does it take for leadership information to propagate to every broker in the cluster?

  1. Topic creation and deletion?
  2. ACLs propagation?
  3. Partition expansion?

Integration with Graphite and similar frameworks

It is useful for users to be able to view all Kafka-related metrics from one web service in their organization. Graphite is one of the most popular open source solutions that allow users to store metrics and view metrics as time-series graphs. We plan to improve the existing DefaultMetricsReporterService so that users can export Kafka Monitor metrics to Graphite and other metrics storage services that they choose.

This involves 3rd party libraries and services LinkedIn does not use or is isn't involved too much with. If users in the open source community wants to maintain this feature with sound documentation and tests, that is okay.

Various improvements to test scheduling

Users should have the ability to schedule custom actions (e.g. broker bounce, broker hard kill) to be executed at regular interval. This can be used together with other services to make assertions (e.g. no message loss, no message reorder) about Kafka's performance under a variety of scenarios. This can be deployed your private kafka cluster to test Kafka's performance and fault tolerance.

This is outside of the scope of the Xinfra Monitor. Perhaps more applicable to Cruise Control. It's impossible to know when it's safe to do these things unless there is all the data that Cruise Control has.

Automatic cluster deployment

Another future work is to provide capability to deploy Kafka cluster using Apache Kafka with the user-specified git hash value. This allows us to automatically test a range of Kafka commits to capture bugs that may be missed by Apache Kafka's unit tests or system tests.

This is outside of the scope of the Xinfra Monitor. Perhaps more applicable to Cruise Control. It's impossible to know when it's safe to do these things unless there is all the data that Cruise Control has.

Clone this wiki locally