Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Own-telemetry: Ability to enabled/disable individual metrics #10769

Open
djaglowski opened this issue Jul 31, 2024 · 24 comments
Open

Own-telemetry: Ability to enabled/disable individual metrics #10769

djaglowski opened this issue Jul 31, 2024 · 24 comments
Assignees
Labels
collector-telemetry healthchecker and other telemetry collection issues

Comments

@djaglowski
Copy link
Member

djaglowski commented Jul 31, 2024

Is your feature request related to a problem? Please describe.

Our notion of basic/normal/detailed telemetry levels is a quick and convenient way for users to tune the volume of the collector's own telemetry, but it is quite crude compared to what we provide to control telemetry not generated by the collector. For example, our mdatagen library provides the ability to enable or disable any individual metric. Not having this ability for the collector's own telemetry means that users may not be able to get the metrics they need without enabling all metrics.

Describe the solution you'd like

In the telemetry configuration, we should offer the ability to enable or disable individual metrics, regardless of telemetry level.

The telemetry level should be used as the starting point for which metrics are enabled or disabled, but then individual metric settings can override.

@djaglowski djaglowski added the collector-telemetry healthchecker and other telemetry collection issues label Jul 31, 2024
@codeboten
Copy link
Contributor

Thanks for opening the issue. One question I have is how this works w/ views, because configuration of views would allow this same functionality. I'd rather not have multiple ways of doing the same thing if possible

@djaglowski
Copy link
Member Author

I'm not familiar enough with views to give a good answer here. Are they exposed to our users via the collector config? Or are they an implementation detail? For that matter, is there a reason we don't use views instead of in-code checks against metric level?

@codeboten
Copy link
Contributor

@djaglowski views will be configurable in the same way as they are for SDK via config, see https://github.com/open-telemetry/opentelemetry-configuration/blob/cc7cd375d893e440afdc294a00f98038ebaa4eab/examples/kitchen-sink.yaml#L221 as an example

The main reason I can see wanting to keep a telemetry level configuration option is for convenience to end users. It would be easier to configure one setting than to have to configure views for all metrics. But then, if we have multiple ways to configured things, we get into the issue of precedence.

@djaglowski
Copy link
Member Author

The main reason I can see wanting to keep a telemetry level configuration option is for convenience to end users.

I agree but couldn't these be implemented as views? (Then for example component developers don't have to check if telemetrySettings.Level == Detailed, etc.)

If we did that, does that normalize configuration in a way where users could add or subtract from our predefined views by adding their own?

@codeboten
Copy link
Contributor

I agree but couldn't these be implemented as views? (Then for example component developers don't have to check if telemetrySettings.Level == Detailed, etc.)

There's no concept of levels built into views. I considered adding a level in mdatagen block for each metric configured via telemetry::metrics to allow the code to be automatically generated and remove the need to check in the components themselves. Otherwise we'd need to provide a mechanism for components to return views based on different levels.

An alternative proposed was to have a different MeterProvider for everything level which would again remove the need to check a level before recording telemetry. This would mean that the component developer would have to make a decision on what they consider appropriate for the different levels.

If we did that, does that normalize configuration in a way where users could add or subtract from our predefined views by adding their own?

We could definitely publish a configuration for each level that users could modify as they see fit. This would be somewhat easy to generate via mdatagen in the documentation as well

@crobert-1
Copy link
Member

I hope I'm not getting sidetracked from the original purpose of this issue, but is this line in the mdatagen metadata schema the same thing as what Dan's proposing here? I'm pretty new to this area of the collector so I may be missing something obvious, but the schema appears to state that this should be supported.

Please let me know if I'm missing something.

@djaglowski
Copy link
Member Author

Yes, that's basically what I suggested @crobert-1. I don't care if it looks just like that or not, but it should be possible to enable or disable individual metrics produced by the collector.

@ptodev
Copy link

ptodev commented Oct 1, 2024

Hello! 👋 I could take a look into implementing this, but I'm still not yet sure what a good solution would look like.

One solution is to use views to disable (but not enable) metrics using views. The user defined views would override the settings configured by the level config argument so that the Collector emits less metrics than it normally would:

service:
  telemetry:
    metrics:
      level: basic
      views:
      - selector:
          instrument_name: otelcol_exporter_send_failed_spans
        stream:
          aggregation:
            drop:

Another solution could be to make the service/telemetry/metrics/level and service/telemetry/metrics/views config arguments mutually exclusive, and to let users create a view which drops specific metrics. I suppose this is similar to @codeboten's suggestion above:

We could definitely publish a configuration for each level that users could modify as they see fit.

However, I'm not sure how ergonomic both of these approaches are. I suppose most users want to specify the metrics they are interested in, not the ones they are not interested in. That's because if you have a dashboard which uses a particular metric, you want to make sure the metric is always present.

@mx-psi
Copy link
Member

mx-psi commented Dec 11, 2024

Re-reading this I am not sure what the decision if any was. Do we want to have a separate mechanism for configuring inidividual metrics different from views or service::telemetry::metrics?

@mx-psi
Copy link
Member

mx-psi commented Dec 17, 2024

I feel like the disabled parameter in the in-development MeterConfig would be the solution for this.

This is not consistent with the rest of our configuration so I filed open-telemetry/opentelemetry-specification/issues/4344

@djaglowski
Copy link
Member Author

I feel like the disabled parameter in the in-development MeterConfig would be the solution for this.

If I'm understanding correctly, the disabled parameter would give us a mechanism for controlling individual metrics. However, I think we still need to agree on the mechanism for configuring the value of the flag.

Re-reading this I am not sure what the decision if any was. Do we want to have a separate mechanism for configuring inidividual metrics different from views or service::telemetry::metrics?

I don't think we have consensus. Admittedly, I am not familiar enough with views to understand what has been suggested.

Perhaps we could continue this conversation starting from what the actual configuration would look like:

Let's just say we have two metrics otelcol.foo.enabledbydefault and otelcol.foo.disabledbydefault which I'd like to flip to non-default settings. What I intended with this issue is that we'd have something like the following:

service:
  telemetry:
    metrics:
      otelcol.foo.enabledbydefault: false
      otelcol.foo.disabledbydefault: true

Can someone please show me how the collector configuration would look if we used views to change only these two metrics to non-default settings?

@mx-psi
Copy link
Member

mx-psi commented Jan 30, 2025

@djaglowski So here is my proposal with some examples. The issue I linked above about MeterConfig.disabled could simplify this, but I am sticking with views config for all examples.

My pitch is that we should conceptualize this as having two incompatible modes for configuring telemetry: a basic, level-based configuration and an advanced, views/meter config based configuration for advanced users that really want to configure things with a lot of detail. The basic mode would be equivalent to the advanced mode with a concrete set of views. Users would be able to get this concrete list of views for a particular level through some sort of mechanism (initially docs, in the longer term it could be a subcommand, this doesn't need to be ready for 1.0), so they can have a starting template.

To make this concrete, let's say we have otelcol.foo.basic available on level: basic or above, and otelcol.foo.detailed available on level: detailed. Some examples:

Examples and table
  • No levels, no views:You don't configure any views or any level (but, maybe, you configure the readers). The defaults apply.
service:
  telemetry:
     metrics:
       readers:
         - periodic:
            exporter:
              otlp_http:
                endpoint: http://localhost:4318/v1/metrics
  • detailed level, no views: You configure levels but don't configure views. In this case both otelcol.foo.basic and otelcol.foo.detailed would be available
service:
  telemetry:
    metrics:
      level: detailed
  • drop view, no level: You drop a metric based on a view. The levels are ignored, the views from level do not apply, so in this case you would have otelcol.foo.detailed but not otelcol.foo.basic. You also get an info-level log that says something like "Views-based configuration, ignoring levels to report metrics. If you want to get the default views for a level, run otelcol get-views [level]"
service:
  telemetry:
     metrics:
       views:
          - selector:
               instrument_name: otelcol.foo.basic
            stream:
               aggregation:
                   drop:
  • Yes levels, yes views: This would result in an error telling you to pick one or the other, something like: error: service::telemetry::metrics::level is incompatible with service::telemetry::metrics:views. See [link to documentation that explains the two possibilities]"
service:
  telemetry:
     metrics:
       level: detailed
       views:
          - selector:
               instrument_name: otelcol.foo.basic
            stream:
               aggregation:
                   drop:

Summary table:

Release otelcol.foo.basic reported? otelcol.foo.detailed reported?
none set (default) Yes No
detailed level, no views Yes Yes
drop view, no level No (log explains why) Yes
level AND views ‼️ invalid config ‼️ invalid config

@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented Jan 30, 2025

@mx-psi This proposal is probably the simplest to implement (either use the specified views or the list corresponding to a level), but I fear this may be a big pain point for users.

Users who just want to drop one metric from the defaults will probably try to simply add a drop view for it. If they didn't have a level configured, they'll suddenly see the amount of exported telemetry explode (everything except the one they dropped), and will need to notice the warning to understand why.

Then, they'll have to write a gigantic config file just to get back to the previous behavior. And once views are set up, they would no longer be able to quickly enable more telemetry for debugging purposes, and would have to remove views for specifically the metrics they want.

I think the ideal solution from a user's perspective would be:

  • level for setting default views;
  • a map[key]bool object like metrics in Dan's example to customize the default views on an emit/drop basis;
  • the views object for more advanced customization.

Even with views being a thing, I think an intermediate option would be good, because it would allow easy toggling of the default metrics without having to understand or think about how views work and interact with one another.

This would certainly be more complicated to implement, but I think the only feasibility concerns are:

  • Can we modify the explicit views list before it is passed to the Go SDK?
  • Can we "merge" the list of explicit views with the default ones, especially considering the selector system?
    Edit: Looking at the relevant code, it looks like the first matching view for a given metric name will be used. So adding the default views after the list of explicit ones may be enough to give the latter priority, with the only caveat being potential warning logs about conflicts.

@mx-psi
Copy link
Member

mx-psi commented Jan 31, 2025

Users who just want to drop one metric from the defaults will probably try to simply add a drop view for it. If they didn't have a level configured, they'll suddenly see the amount of exported telemetry explode (everything except the one they dropped), and will need to notice the warning to understand why.

IMO any system that has multiple ways of configuring the same thing runs into this risk of confusion. If needed we could have a (name to be bikeshedded) level_based_config::enabled: false flag that you need to disable to be able to use views, if we want to make it super explicit.

Then, they'll have to write a gigantic config file just to get back to the previous behavior.

Our config system is flexible enough to make this a nonissue, if we get merge rules right they could just include a file and merge it with their config. They would just generate the views config into a genbasic_level_views.yaml file and do something like:

otelcorecol --config genbasic_level_views.yaml --config my_config.yaml  --merge=append

And once views are set up, they would no longer be able to quickly enable more telemetry for debugging purposes and would have to remove views for specifically the metrics they want.

Following the above, they would just have to change the file they include

otelcorecol --config gendetailed_level_views.yaml --config my_config.yaml  --merge=append

Even with views being a thing, I think an intermediate option would be good, because it would allow easy toggling of the default metrics without having to understand or think about how views work and interact with one another.

I agree that views can be complex. This however feels like a problem we should solve at the specification level, if it's complex for our users it would be complex for users of any other application.

So adding the default views after the list of explicit ones may be enough to give the latter priority, with the only caveat being potential warning logs about conflicts.

It would be interesting to understand the exact semantics of having overlapping views. I am not categorically opposed to this option, but my initial gut feeling is that this could be as confusing as the solution I am proposing.

@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented Jan 31, 2025

IMO any system that has multiple ways of configuring the same thing runs into this risk of confusion. If needed we could have a (name to be bikeshedded) level_based_config::enabled: false flag that you need to disable to be able to use views, if we want to make it super explicit.

I half-agree; I think a system with a "broad" configuration and a "detailed" configuration that customizes it is relatively intuitive in comparison with a system where setting the latter essentially changes the former to "everything on".

If we go with your proposal, I'm in support of something like level_based_config::enabled (or maybe something like level: custom?), it would make it harder to accidentally switch everything on.

Our config system is flexible enough to make this a nonissue, if we get merge rules right they could just include a file and merge it with their config.

It would be interesting to understand the exact semantics of having overlapping views. I am not categorically opposed to this option, but my initial gut feeling is that this could be as confusing as the solution I am proposing.

Concatenating the default list of views with a custom one by applying two configuration files will have the same confusing semantics as doing it automatically in the code, it just requires more work on the user's part.

This however feels like a problem we should solve at the specification level, if it's complex for our users it would be complex for users of any other application.

Definitely agree with that part. I think the way that conflicting Views interact is surprising and confusing in the spec. I also think the inclusion of MetricConfig is a sign that others agree that using Views to enable/disable metrics is too complicated, and that another way to do it would be useful. I'm not sure if the interaction between the two is specified? If the spec goes with a "Views take priority over MetricConfig.enabled" solution and adds a YAML-based way to specify MetricConfig, my proposal could be implemented by simply setting a default value for the MetricConfigs based on the level.

@mx-psi
Copy link
Member

mx-psi commented Jan 31, 2025

Concatenating the default list of views with a custom one by applying two configuration files will have the same confusing semantics as doing it automatically in the code, it just requires more work on the user's part.

Hm, I think the important difference between the concatenate-views solution and the basic-and-advanced-modes solution is that users are not required to use the default config in the basic-and-advanced-modes solution; they can just write their own thing if they want to.

@mx-psi
Copy link
Member

mx-psi commented Feb 3, 2025

@jade-guiton-dd This is how the MeterConfig would work (pending approval by the Configuration WG): open-telemetry/opentelemetry-configuration#140. Could you make your proposal more concrete with some examples like I did on #10769 (comment) ?

@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented Feb 3, 2025

Summary of the new `MeterConfig` and `MeterConfigurator` API

The SDK spec has an experimental spec for a MeterConfig struct:

MeterConfig

A MeterConfig defines various configurable aspects of a Meter's behavior. It consists of the following parameters:

  • disabled: A boolean indication of whether the Meter is enabled.

and a MeterConfigurator parameter for the MeterProvider:

MeterConfigurator

A MeterConfigurator is a function which computes the MeterConfig for a Meter.
The function MUST accept the following parameter:

  • meter_scope: The InstrumentationScope of the Meter.

MeterConfigurator is modeled as a function to maximize flexibility. However, implementations MAY provide shorthand or helper functions to accommodate common use cases:

  • Select one or more Meters by name, with exact match or pattern matching.
  • Disable one or more specific Meters.
  • Disable all Meters, and selectively enable one or more specific Meters.

The PR linked above proposes the following example YAML config to customize the MeterConfigurator:

meter_provider: # This corresponds to `service::telemetry::metrics` in the OTelCol config
  meter_configurator:
    default_config: # Configure the default meter config used there is no matching entry in .meter_configurator.meters.
      disabled: true
    meters:
      - name: io.opentelemetry.contrib.* # Meter names to match, which can include '?' and '*' placeholders.
        config:
          disabled: false

This hasn't been implemented in the SDK yet, but presumably this implies that when the meter_configurator key is set, a MeterConfigurator function is set on the provider by the SDK, which returns a MeterConfig corresponding to the first matching rule in meters, or the one in default_config if there are no matches.


With this in mind, my updated proposal would be to allow specifying both level and meter_configurator. The level would provide a default meter_configurator, which would be customized by the one explicitly provided, allowing easy toggling of metrics on top of the defaults. This could be done by setting the MeterConfigurator ourselves to a function implementing this fallback to level-based defaults, no concatenating of lists required.

However, there is a big issue with this proposal in the form of default_config. That key conflicts with my intended use of level as a "default config". No matter what merge policy we choose, I think it would be confusing for users. Therefore, I think we will need to make level and default_config mutually exclusive, similar to Pablo's proposal.

I think the best way to do that would be to require setting level to custom before making use of default_config. This way users are still clearly aware that they are disabling our level-based defaults. Users who want to customize the level-based defaults will leave default_config unset, and users who want everything except specific metrics or nothing but specific metrics will set level: custom and define default_config to what they want.

Example configs

Let's assume we have the same example otelcol.foo.basic and otelcol.foo.detailed metrics as above.

Example 1: Oops, All Defaults

service:
  telemetry:
    metrics:
      readers: [...]

level defaults to normal. We define a MeterConfigurator which enables normal and up metrics.
→ Only otelcol.foo.basic is emitted.

Example 2: Level up

service:
  telemetry:
    metrics:
      readers: [...]
      level: detailed

level: detailed is applied. We define a MeterConfigurator which enables detailed and up metrics.
→ Both example metrics are emitted.

Example 3: Customized defaults

service:
  telemetry:
    metrics:
      readers: [...]
      meter_configurator:
        meters:
          - name: otelcol.foo.basic
            config:
              disabled: true

level defaults to normal. Since meter_configurator is set without default_config, we define a MeterConfigurator which applies the first matching MeterConfig in meters, or, if no matches are found, enables normal and up metrics.
→ Neither metric is emitted (basic is explicitly disabled, detailed is disabled by the default level).

Example 4: Customized defaults 2: Electric Boogaloo

service:
  telemetry:
    metrics:
      readers: [...]
      meter_configurator:
        meters:
          - name: otelcol.foo.detailed
            config:
              disabled: false

→ Same as above.
→ Both metrics are emitted (basic is enabled by the default level, detailed is explicitly enabled).

Example 5: I am altering the defaults, pray I do not alter them any further

service:
  telemetry:
    metrics:
      readers: [...]
      level: custom
      meter_configurator:
        default_config:
          disabled: false

→ Because level is set to custom, level-based config is entirely disabled, and the meter_configurator is passed as-is to the SDK.
→ Both metrics are emitted (the default is disabled: false and no further rules are set).

Example 6: Conflicting directives

service:
  telemetry:
    metrics:
      readers: [...]
      level: normal
      meter_configurator:
        default_config:
          disabled: false

meter_configurator::default_config is set, but level is not set to custom. Return an error explaining that metrics default either to what the level decides, or to the default_config, but both cannot be used at the same time.


Important note: If the SDK team decides to make the default_config key required rather than optional, then my proposal is to give up on trying to conciliate our config with the SDK's, and fall back on Pablo's proposal entirely, and make level and meter_configurator as a whole mutually exclusive. I would recommend using the same level: custom approach as above.

@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented Feb 10, 2025

After discussing it with @mx-psi, I think we could start with a simple "no views or (once implemented) meter_configurator unless level is set to custom/detailed" approach, and come back to this discussion once meter_configurator is actually implemented in the SDK; it should be easier to decide when the API is more concrete.

@mx-psi mx-psi self-assigned this Feb 10, 2025
@mx-psi mx-psi moved this from Todo to In Progress in Collector: v1 Feb 10, 2025
@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented Feb 11, 2025

I think there's been a misunderstanding on my part here: meter_configurator allows one to toggle meters, not individual metrics. You could for instance toggle all metrics emitted by go.opentelemetry.io/collector/exporter/exporterhelper, but not specifically otelcol_exporter_queue_capacity. (That also explains why the interaction between views and meter config isn't specified; they're on a different level of coarseness.)

So if we want a more ergonomic approach to toggling metrics than "enable everything then filter with views" in the future, we will have to go back to my original "merge the views" idea, and/or add our own metrics map[string]bool key as previously suggested.

@mx-psi
Copy link
Member

mx-psi commented Feb 11, 2025

I got that wrong too, sorry :/

@djaglowski
Copy link
Member Author

Let's just say we have two metrics otelcol.foo.enabledbydefault and otelcol.foo.disabledbydefault which I'd like to flip to non-default settings. What I intended with this issue is that we'd have something like the following:

service:
  telemetry:
    metrics:
      otelcol.foo.enabledbydefault: false
      otelcol.foo.disabledbydefault: true

Can someone please show me how the collector configuration would look if we used views to change only these two metrics > to non-default settings?

I don't think there was ever a direct answer to this question and I want to highlight this because IMO it means we haven't identified a solution that's going to be easy enough for the average user.

I think anything which requires users to reason about views, readers, meters, or anything other than "this one metric that I care about" is going to be useful only to a tiny fraction of users.

metrics map[string]bool

In my opinion, this is the only sensible user facing way to enable and disable metrics, whatever the implementation may be.

level can act as a default value for this map, and then any explicitly enabled or disabled metrics can be layered on top. This is simple to reason about and gives users a straightforward solution when they need to enable or disable a specific metric.

Apologies for opining without much consideration for the implementation here, but the discussion seems to be headed towards exposing the user to internal concerns more than providing them with a simple solution.

@jade-guiton-dd
Copy link
Contributor

jade-guiton-dd commented Feb 12, 2025

To answer your question directly:

Let's assume that otelcol.foo.enabledbydefault and otelcol.foo.disabledbydefault are enabled/disabled by default in level: normal.

Under Pablo's proposal, using views would be incompatible with level. So, to toggle only these two metrics compared to level: normal, I believe you would have to:

  • Remove the level key if there is one. Combined with the presence of the views key, this will enable all metrics by default.
  • Run otelcol get-views normal to get the list of default views set by level: normal. There would be one with drop aggregation for otelcol.foo.disabledbydefault. Make this into a config file.
  • In your own config file, add a view with drop aggregation for otelcol.foo.enabledbydefault, and one with default aggregation for otelcol.foo.disabledbydefault.
  • Run the Collector with both the default view config and your own config applied, in that order, with --merge=append.

Coming up with this requires knowledge of views and config merging, as well as undocumented and unstandardized knowledge about how conflicting views interact with each other in the SDK. (If I remember correctly from reading the code last time: If there is a matching non-drop view, apply the first one in the list. Otherwise, if there is a matching drop view, apply it. Otherwise, apply the default aggregation.)

The simpler alternative users are likely to take if they get past step 2 is to avoid merging configs and directly modify the default list of views. But this means you now have a very large, custom config file.

@jade-guiton-dd
Copy link
Contributor

I've opened a draft PR (#12433) implementing the part that the various proposals seem to agree on, ie. allowing users to set views in the case where no views are implicitly created (level: detailed)

@jade-guiton-dd jade-guiton-dd moved this from In Progress to Waiting for reviews in Collector: v1 Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collector-telemetry healthchecker and other telemetry collection issues
Projects
Status: Waiting for reviews
Development

No branches or pull requests

6 participants