Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/elasticinframetrics] Add support for auto mapping mode #455

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

felixbarny
Copy link
Member

With open-telemetry/opentelemetry-collector-contrib#38114, @axw has added support for controlling the mapping mode of the ES exporter with client metadata (X-Elastic-Mapping-Mode="<mode>").

This PR leverages that in EIMP so that the processor automatically sets the mapping mode to ecs for the remapped metrics. This should simplify the overall EDOT configuration because we can just use the default configuration of the ES exporter, without having to create another instance of the exporter with the ecs mapping mode.

TODOs:

  • end-to-end testing

@felixbarny felixbarny requested a review from a team as a code owner March 13, 2025 17:35
return client.Info{
Addr: info.Addr,
Auth: info.Auth,
Metadata: client.NewMetadata(map[string][]string{"x-elastic-mapping-mode": {mode}}),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's no way to retain the existing metadata i.e. we're overriding it here. Could this be an issue? @axw are we planning to add something like the project id to the metadata that would get removed by setting the mapping mode?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urgh :) Yeah we will be setting that in client metadata, and yeah that is an issue indeed. I believe we would need to add a method to client.Metadata to get its keys.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a method to create a new metadata by taking in old metadata and a map - it might be more efficient and play well with the immutability of the metadata.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in open-telemetry/opentelemetry-collector-contrib#38114, we should change this to use a scope attribute rather than client metadata.

@felixbarny
Copy link
Member Author

After doing end-to-end testing, it seems that the client metadata doesn't survive a hop through a gateway. When sending the data produced by EIMP directly to ES, everything works as expected. We get a metrics-hostmetricsreceiver.otel-default data stream for the original metrics and, for example, metrics-system.load-default for the transformed metrics. However, when sending the transformed data via an otlp exporter to another collector, which then forwards the data to ES, we only get OTel data streams, such as metrics-system.load.otel-default.

I wonder if the client metadata path is a bit too brittle at the moment and whether controlling the mapping mode should also be possible with a resource attribute.

@carsonip
Copy link
Member

I wonder if the client metadata path is a bit too brittle at the moment and whether controlling the mapping mode should also be possible with a resource attribute.

I'm also a little worried about client metadata via persistent queue, although I haven't tested it yet. I would be surprised if client metadata survives collector restarts.

@axw
Copy link
Member

axw commented Mar 16, 2025

After doing end-to-end testing, it seems that the client metadata doesn't survive a hop through a gateway. When sending the data produced by EIMP directly to ES, everything works as expected. We get a metrics-hostmetricsreceiver.otel-default data stream for the original metrics and, for example, metrics-system.load-default for the transformed metrics. However, when sending the transformed data via an otlp exporter to another collector, which then forwards the data to ES, we only get OTel data streams, such as metrics-system.load.otel-default.

Are you using the headers_setter extension? Client metadata doesn't automatically get added to outgoing headers, you need to configure it.

I wonder if the client metadata path is a bit too brittle at the moment and whether controlling the mapping mode should also be possible with a resource attribute.

I'm not totally opposed to this, but it will complicate the exporter, so I'd like to avoid it if we can...

Would it be feasible to have a separate exporter/pipeline for these metrics? i.e. configure a second OTLP exporter that sets the X-Elastic-Mapping-Mode header.

@felixbarny
Copy link
Member Author

felixbarny commented Mar 17, 2025

Are you using the headers_setter extension? Client metadata doesn't automatically get added to outgoing headers, you need to configure it.

Ah, thanks for the tip. I've tried it out but I still can't get it to work. (It does work when the edge collector sends to Elasticsearch directly)

A snippet from the edge collector config:

extensions:
  headers_setter:
    headers:
      - key: x-elastic-mapping-mode
        from_context: x-elastic-mapping-mode

service:
  extensions: [headers_setter]

And a snippet from the gateway collector config:

receivers:
  otlp:
    protocols:
      grpc:
        max_recv_msg_size_mib: 32
        include_metadata: true

I'm a bit worried that when using client metadata, it will make the configuration quite complex and brittle. It seems easy to misconfigure, hard to get right, and hard to debug when it happens. Controlling the mapping mode via resource attributes seems a lot more robust and doesn't require additional configuration. It seems it may be worth the complexity in the ES exporter.

We also already have the elasticsearch.mapping.hints attribute which seems conceptually similar to the mapping mode.

Would it be feasible to have a separate exporter/pipeline for these metrics? i.e. configure a second OTLP exporter that sets the X-Elastic-Mapping-Mode header.

In principle yes. However, I think it kind of defeats the purpose of the automatic mapping mode if it's not really automatic but requires a different pipeline. It would be pretty handy if all that's needed is for the EIMP to set a resource attribute controlling the mapping mode and if there's no way to misconfigure it.

I've created a POC for the ES exporter to support resolving the mapping mode via resource attributes: open-telemetry/opentelemetry-collector-contrib#38667. At the moment, it does that in addition to looking up the mapping mode in client metadata. Maybe it makes sense to keep it that way. I could see the header being useful for the MIS migration use case. There's also precedent in resolving information from both resource attributes and the context: The kafkaexporter does that when determining the topic.

@axw
Copy link
Member

axw commented Mar 18, 2025

Not sure why it's not working with a gateway collector - maybe there's some queuing? @carsonip is correct that client metadata does not survive queues at the moment.

In principle yes. However, I think it kind of defeats the purpose of the automatic mapping mode if it's not really automatic but requires a different pipeline. It would be pretty handy if all that's needed is for the EIMP to set a resource attribute controlling the mapping mode and if there's no way to misconfigure it.

I agree that it complicates things. Will this pipeline configuration be exposed to users? Does it have to be, or can it be encapsulated?

Maybe that doesn't really matter though, if there's no convenient way to send it to a self-managed gateway collector and have it survive queues and such. One option would be to have the gateway route data to two different pipelines based on the mapping mode, but it becomes even more complicated...

There's also precedent in resolving information from both resource attributes and the context: The kafkaexporter does that when determining the topic.

I wouldn't use that as a role model tbh. Off the top of my head:

EIMP is not meant to last forever, right? This is part of my reluctance: the exporter is forever, so any complexity we add to it will last for a long time.

But, maybe it's still the way to go. I don't have a good solution.

@felixbarny
Copy link
Member Author

Will this pipeline configuration be exposed to users? Does it have to be, or can it be encapsulated?

I think we should consider everything that matters for the EDOT configuration for self-managed use cases as something that's exposed to users and should aim to simplify the configuration. Also for our own sake. Debugging these things if it doesn't work for a user would be pretty painful.

EIMP is not meant to last forever, right? This is part of my reluctance: the exporter is forever, so any complexity we add to it will last for a long time.

Maybe we could support the resource attribute-based mapping mode selection as an undocumented feature that we can remove once we sunset EIMP.

However, the mapping mode selection isn't relevant only for EIMP. It's also relevant for when Elastic Agent is using the OTel collector as a shipper. It would then use the bodymap mapping mode. Having said that, this could also be solved with having a separate instance of the ES exporter which sets an explicit header.

At least, what I did in my POC was to consider that every resource may have a different mapping mode, instead of just assuming the first is relevant for the whole batch.

@axw
Copy link
Member

axw commented Mar 21, 2025

Felix and I chatted about this some more on Slack, sharing some things we discussed:

  • EIMP is used in gateway mode when exporting directly to Elasticsearch. So we don't need to worry about sending the header out of the collector, we only need somehow tell the Elasticsearch exporter in the same collector which mapping mode to use.
  • Maybe instead of injecting mapping mode client metadata in the EIMP processor and battling with queues and batch processors etc. dropping the metadata, we could add some logic to the exporter to determine the mapping mode based on some properties of the data -- and we can make this specific to EIMP or the elasticapm processor etc.
  • The above only needs to be in EDOT, and it generally feels a bit gross to make the exporter aware of an EDOT processor, so perhaps we can add an extension point to the exporter so we can inject some behaviour in EDOT. e.g. provide a different function for determining the mapping mode, which could determine the mapping mode for the data in a batch and split them into multiple requests before exporting.

@felixbarny
Copy link
Member Author

The above only needs to be in EDOT, and it generally feels a bit gross to make the exporter aware of an EDOT processor, so perhaps we can add an extension point to the exporter so we can inject some behaviour in EDOT

A counter to that is that this complicates the process for users who want to build their own distribution. They not only have to include the elastic processor/EIMP, but also the extension for the ES exporter.

I agree with the rest but I'm not sure about injecting that logic into the ES exporter. I think I would lean towards adding this in the ES exporter, but have it there as an undocumented feature that we can remove once we sunset EIMP.

However, I like the idea of splitting the batch based on an attribute. That may help in reducing the surface area and complexity of the ES exporter changes. It would also make it easier to remove the feature without having to do another big change again.

@axw
Copy link
Member

axw commented Mar 24, 2025

A counter to that is that this complicates the process for users who want to build their own distribution. They not only have to include the elastic processor/EIMP, but also the extension for the ES exporter.

Good point. An option to solve that would be to have the extension be registered as an exporter, so it can be used in OCB. i.e. have something like opentelemetry-collector-components/exporter/elasticsearchexporter, which imports the upstream one and extends it slightly. It might lead to confusion though.

I agree with the rest but I'm not sure about injecting that logic into the ES exporter. I think I would lean towards adding this in the ES exporter, but have it there as an undocumented feature that we can remove once we sunset EIMP.

This is simpler and probably fine though.

However, I like the idea of splitting the batch based on an attribute. That may help in reducing the surface area and complexity of the ES exporter changes. It would also make it easier to remove the feature without having to do another big change again.

I suppose the way to go is:

  1. We define some new attribute that the exporter takes as an indication to map the ECS. Maybe it can just be elasticsearch.mapping.mode like you implemented in your PoC. I would be inclined to keep it limited to instrumentation scope unless there's a good reason to support it elsewhere, so we can keep the overhead down.
  2. Update EIMP to set the attribute on translated metrics.
  3. When it exists, update the elasticapm processor to set the attribute on data that should be in the Elastic APM schema.
  4. Update the exporter to identify and partition these scopes, and create separate bulk requests for them.

So it ends up being very similar to what you've done already, but without the session bundle changes.

@axw
Copy link
Member

axw commented Mar 24, 2025

We define some new attribute that the exporter takes as an indication to map the ECS. Maybe it can just be elasticsearch.mapping.mode like you implemented in your PoC. I would be inclined to keep it limited to instrumentation scope unless there's a good reason to support it elsewhere, so we can keep the overhead down.

Another option would be to set a different schema URL that identifies the data as being ECS, and use ECS mapping mode for data with that set.

@felixbarny
Copy link
Member Author

Without thinking too deeply about it, I think I like the attribute more than the scope name for this.

I would be inclined to keep it limited to instrumentation scope unless there's a good reason to support it elsewhere, so we can keep the overhead down.

Initially, I was even thinking to only support this as a resource attribute. Do you think we need to have that finer granularity on scopes or are you thinking that it's more clear that way?

@axw
Copy link
Member

axw commented Mar 24, 2025

@felixbarny I just though it might be a little more correct: they're attributes describing the instrumentation (EIMP etc.) rather than the resource (e.g. a host). I don't have a very strong opinion as long as it's a temporary measure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants