Introduce a way to tag clusters (and, optionally, nodes) #12552

michaelklishin · 2024-10-19T00:30:28Z

Per discussion with @SimonUnge but also @stefanmoser @mkuratczyk.

Problem Definition

Larger users can have thousands of clusters used for all kinds of purposes. Sometimes it may be
necessary to perform certain operations (e.g. upgrades) on a subset of them, e.g. those that are
used by development environments only.

Right now there only so many ways of doing it:

Keep this cluster metadata in an external store. This is not always an option
Prefix cluster name with production-* or something like that. This is fragile and allows for only a couple of tags

Cluster Tags

One solution can be configuring cluster metadata with a map (a set of pairs) in rabbitmq.con:

cluster_tags.environment = production

cluster_tags.region = us-east
cluster_tags.az = us-east-3

This map can be then added to GET /api/overview HTTP API responses. In Prometheus responses, this could look something like this (please read the comments, with Prometheus it is much more nuanced than it sounds):

rabbitmq_identity_cluster_tags{environment = "production", region = "us-east", az = "us-east-3"}

This data can be stored in the node's application environment or in a global runtime parameter
(for "last write wins" consistency).

Node Tags

The same idea can be extended to node tags that will only be stored in the application
environment since this data is local to the node.

node_tags.environment = production

node_tags.region = us-east
node_tags.az = us-east-3

Which could look approximately like this in Prometheus output (please read the comments, with Prometheus it is much more nuanced than it sounds):

rabbitmq_identity_node_tags{environment = "production", region = "us-east", az = "us-east-3"}

The usefulness of this is less clear.

The text was updated successfully, but these errors were encountered:

ansd · 2024-10-20T14:08:56Z

From a pure Prometheus point of view, I see two potential anti-patterns here:

The suggested new Prometheus metrics rabbitmq_identity_cluster_tags and rabbitmq_identity_node_tags are info metrics. According to Prometheus best practices:

The gauge should have the suffix _info

Hence, instead of introducing new Prometheus info metrics, I think it's better to re-use the existing Prometheus metric rabbitmq_identity_info.

These labels are meant to be target labels rather than instrumentation labels. In other words, RabbitMQ itself should probably not emit labels such as {environment = "production", region = "us-east", az = "us-east-3"}.
This is explained in "Prometheus: Up & Running":

Labels come from two sources, instrumentation labels and target labels. When you are
working in PromQL there is no difference between the two, but it’s important to dis‐
tinguish between them in order to get the most benefits from labels.
Instrumentation labels, as the name indicates, come from your instrumentation.
They are about things that are known inside your application or library, such as the
type of HTTP requests it receives, which databases it talks to, and other internal
specifics.
Target labels identify a specific monitoring target; that is, a target that Prometheus
scrapes. A target label relates more to your architecture and may include which appli‐
cation it is, what datacenter it lives in, if it is in a development or production environ‐
ment, which team owns it, and of course, which exact instance of the application it is.
Target labels are attached by Prometheus as part of the process of scraping metrics.
Different Prometheus servers run by different teams may have different views of what
a “team,” “region,” or “service” is, so an instrumented application should not try to
expose such labels itself. Accordingly, you will not find any features in client libraries
to add labels across all metrics of a target. Target labels come from service discovery
and relabelling and are discussed further in Chapter 8.

michaelklishin · 2024-10-21T00:49:09Z

@ansd this feature is not 100% Prometheus-specific, so even if the Prometheus view of the world assumes that these are "deployment labels", RabbitMQ will have to log and expose them e.g. in the HTTP API and CLI tools in order to make them useful.

@SimonUnge will configuring your monitoring and upgrade tooling be possible to comply with the monitoring target concept quoted above?

ansd · 2024-10-21T07:14:26Z

@michaelklishin yes, my comment above was purely about Prometheus' point of view. Tagging/labelling RabbitMQ nodes with key/value metadata and exposing this metadata via the HTTP API, CLI tools and logs makes sense to me.

michaelklishin added the enhancement label Oct 19, 2024

michaelklishin assigned michaelklishin and SimonUnge Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a way to tag clusters (and, optionally, nodes) #12552

Introduce a way to tag clusters (and, optionally, nodes) #12552

michaelklishin commented Oct 19, 2024 •

edited

Loading

ansd commented Oct 20, 2024

michaelklishin commented Oct 21, 2024

ansd commented Oct 21, 2024 •

edited

Loading

Introduce a way to tag clusters (and, optionally, nodes) #12552

Introduce a way to tag clusters (and, optionally, nodes) #12552

Comments

michaelklishin commented Oct 19, 2024 • edited Loading

Problem Definition

Cluster Tags

Node Tags

ansd commented Oct 20, 2024

michaelklishin commented Oct 21, 2024

ansd commented Oct 21, 2024 • edited Loading

michaelklishin commented Oct 19, 2024 •

edited

Loading

ansd commented Oct 21, 2024 •

edited

Loading