Skip to content

Conversation

@jsoriano
Copy link
Member

@jsoriano jsoriano commented Sep 10, 2025

Proposed commit message

Support fields added and fixed for pipeline metrics in Beats 8.15.0.

  • Definitions added for new fields.
  • Ingest pipeline to adapt renamed field.
  • Revert kibana version constraint to ^8.11.2 || ^9.0.0.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Screenshots

@jsoriano jsoriano self-assigned this Sep 10, 2025
@jsoriano jsoriano force-pushed the elastic-package-rename-filled-pct branch from ddff542 to b4b8ab6 Compare September 10, 2025 18:56
@jsoriano jsoriano changed the title Support new pipeline fields from beats logs [elastic_agent] Support new pipeline fields from beats logs Sep 10, 2025
@jsoriano jsoriano force-pushed the elastic-package-rename-filled-pct branch from a1fd471 to bc4bdcf Compare September 10, 2025 19:12
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

Comment on lines 6 to 7
field: monitoring.metrics.libbeat.pipeline.queue.filled.pct.events
target_field: monitoring.metrics.libbeat.pipeline.queue.filled.events
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
field: monitoring.metrics.libbeat.pipeline.queue.filled.pct.events
target_field: monitoring.metrics.libbeat.pipeline.queue.filled.events
field: monitoring.metrics.libbeat.pipeline.queue.filled.pct.events
target_field: monitoring.metrics.libbeat.pipeline.queue.filled.pct

type: long
metric_type: counter
description: Maximum number of events in a queue
description: Maximum number of events in a queue if it has one, otherwise zero.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there so many changes in field files? Were we missing so many fields before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the PR in Beats all these fields have been added: elastic/beats#39774

Haven't been able to confirm.

@jsoriano jsoriano marked this pull request as ready for review September 11, 2025 16:03
@jsoriano jsoriano requested a review from a team as a code owner September 11, 2025 16:03
@faec
Copy link

faec commented Sep 11, 2025

This is a meta-comment to clarify what the situation was as of ~3 days ago and how it got that way.

Timeline:

Commentary:

  • The queue filled metric never existed under any name until 8.15.0, which included both the initial definition (Add queue percentage to libbeat metrics beats#39205) and the naming fix ([libbeat] Add a metrics observer to the queue beats#39774); 8.14.3 and earlier included neither.
  • Neither the filled metric nor any of the other new ones were ingested by Metricbeat monitoring until v8.17.2 and later.
  • Until today, I believed no released version ever reported metrics under the mistaken name queue.filled.pct.events, because the name was corrected before it hit a release. This was almost true, however:
    • There was an "inert" occurrence of the old name, with no references, that was missed in the initial fix, and stayed there until the Metricbeat schema update (https://github.com/elastic/beats/pull/42439/files#diff-558176dd2327daca24e089ce7ec773653e787ef7438af3d3df257e655b383e65L133).
    • The schema update PR cleaned up a few "redundant" metrics variable definitions that had no code references, since the same variables were created and used elsewhere, not noticing that the "redundant" filled metric had used the old / incorrect name. [But of course, it was unreferenced and absent from the Metricbeat schema, so there was no way for it to actually show up anywhere except the logs, right?]
    • The real queue metrics also cleared their registry on initialization, so the old name only existed for a short window during pipeline initialization when the queue had not yet been created.
    • Except, the bug fixed in [libbeat] Fix and test initialization of queue monitors beats#40480 prevented the old name from being removed in 8.15.0.
    • If 8.15.0 was run with self-monitoring (self-monitoring doesn't enforce the schema restriction of Metricbeat monitoring, so it includes all variables), queue.filled.pct.events would be ingested, always with a value of zero.
    • Also, for completeness: Between versions 8.15.1 and 8.17.1 (inclusive), if self-monitoring is enabled, and a metrics scan interval triggers in the brief window before the queue is created (I'm not sure this is actually possible, but if so it's probably extremely rare), a (zero) queue.filled.pct.events field would be ingested.
  • The upshot is: running with self-monitoring on precisely version 8.15.0, then upgrading to a later version with self-monitoring, or to 8.17.2 or later with any monitoring, can cause a mapping conflict where the earlier queue.filled.pct.events field interferes with the queue.filled.pct field used by all other versions.
  • One important consequence of all this: in trying to untangle this mess, it is always safe to delete any field with the name queue.filled.pct.events. To the extent it ever appeared at all, it was always zero.
  • This still leaves the field definitions in the Agent package, which still use the old field name, and would seem to conflict with the real monitoring values since 8.17.2, although I'm not sure exactly how a nested example like this would be handled -- if that did cause a mapping conflict, I would expect it to be an extremely common error, which doesn't seem to have been the case? So that should be addressed, but our recent issues might be an example of the former corner case instead.

@jsoriano
Copy link
Member Author

Thanks for the explanation!

  • The upshot is: running with self-monitoring on precisely version 8.15.0, then upgrading to a later version with self-monitoring, or to 8.17.2 or later with any monitoring, can cause a mapping conflict where the earlier queue.filled.pct.events field interferes with the queue.filled.pct field used by all other versions.

Then this PR and #15244 should fix the issue in the integration, right?

Between versions 8.15.1 and 8.17.1 (inclusive), if self-monitoring is enabled, and a metrics scan interval triggers in the brief window before the queue is created (I'm not sure this is actually possible, but if so it's probably extremely rare), a (zero) queue.filled.pct.events field would be ingested.

The pipelines I am adding in this PR would be intended to handle these cases. If this is so rare, is it worth it to add them?

  • The queue filled metric never existed under any name until 8.15.0

This explains why I didn't manage to reproduce the issue with the oldest version supported by the package (8.11.2).

@cmacknz
Copy link
Member

cmacknz commented Sep 11, 2025

If this is so rare, is it worth it to add them?

IMO yes we already have a support case about this, let's just put in the fixes and then not have to think about or explain this again.

@faec
Copy link

faec commented Sep 11, 2025

If this is so rare, is it worth it to add them?

In addition to Craig's comment: I'd expect that the fix for that case is the same as the (presumably more common) one of running 8.15.0 with self-monitoring -- the correct behavior with any version of Beats is always to delete the queue.filled.pct.events field if it exists, since in any case where it could have existed it didn't contain real data.

@andrewkroh andrewkroh added the Team:Elastic-Agent Platform - Ingest - Agent [elastic/elastic-agent] label Sep 11, 2025
@elasticmachine
Copy link

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@jsoriano
Copy link
Member Author

If this is so rare, is it worth it to add them?

In addition to Craig's comment: I'd expect that the fix for that case is the same as the (presumably more common) one of running 8.15.0 with self-monitoring -- the correct behavior with any version of Beats is always to delete the queue.filled.pct.events field if it exists, since in any case where it could have existed it didn't contain real data.

Ok, so instead of renaming it I will remove it.

@jsoriano
Copy link
Member Author

Btw, another follow up issue created, to have system tests on this package: elastic/elastic-package#2903

@elasticmachine
Copy link

💚 Build Succeeded

History

cc @jsoriano

@elastic-sonarqube
Copy link

Quality Gate failed Quality Gate failed

Failed conditions
11.1% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube

@botelastic
Copy link

botelastic bot commented Oct 12, 2025

Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Oct 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Integration:elastic_agent Elastic Agent Stalled Team:Elastic-Agent Platform - Ingest - Agent [elastic/elastic-agent]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants