Skip to content

Conversation

andsel
Copy link
Contributor

@andsel andsel commented Aug 18, 2025

Release notes

Implements average batch event count and byte size metrics. The collection of such metric could be disabled, enabled for each batch or done on a sample of the total batches.

What does this PR do?

  • Instantiated metric pipelines.<pipeline id>.batch.count to count number of matches to compute the average events and byte per batch
  • Instantiated metric pipelines.<pipeline id>.batch.total_bytes to sumup all the batches event's byte estimation. Exposed metric pipelines.<pipeline id>.batch.byte_size.average.lifetime containing the average byte size of each batch.
  • created new setting pipeline.batch.metrics.sampling_mode which could have 3 values: disabled, minimal and full. In this case id disable no batch metric is exposed in the _node/stats API. minimal count batches and estimates the size only for 1% of the total while full is for every batch. This setting leverages existing Logstash setting infrastructure so that one defined at pipeline level (defined in pipelines.yml) takes precedence over the global one (defined in logstash.yml).

Why is it important/What is the impact to the user?

Exposing metric related to average batch byte size and event count let the user discover the average structure of their batches, understanding if the batches are fulfilled and eventually understand how to set pipeline.batch.size and pipeline.batch.delay so that goal is reached.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation handled by Document feature flag and byte size and event count average metrics #17976
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • update default setting pipeline.batch.metrics.sampling_type into logstash.yml to none.

How to test this PR locally

Edit pipeline.batch.metrics in logstash.yml setting the three different values none, minimal, full.
Launch Logstash and verify the metrics with:

curl http://localhost:9600/_node/stats | jq .pipelines.main.batch

Example pipeline:

input {
  java_generator {
    # 1KB
    message => '{"clientip": "192.168.1.10", "ident": "-", "auth": "johndoe", "timestamp": "01/Jul/2025:15:22:10 +0000", "verb": "GET", "request": "/search?q=cloud+logging+apache&lang=en&limit=50&page=2&sort=desc&filter=active&country=us&user_id=123456&session_id=abcdef1234567890abcdef1234567890abcdef&tracking_id=track-0987654321abcdef0987654321abcdef", "httpversion": "1.1", "response": "200", "bytes": "1234", "referrer": "https://www.example.com/ref?q=logtest", "agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", "headers": { "X-Forwarded-For": "203.0.113.42", "Referrer-Policy": "strict-origin-when-cross-origin", "X-Request-ID": "req-abcdef1234567890abcdef1234567890"},"message": "192.168.1.10 - johndoe [01/Jul/2025:15:22:10 +0000] \"GET /search?... HTTP/1.1\" 200 1234 ...","logsource": "apache_access","event_type": "access","@timestamp": "2025-07-01T15:22:10Z"}'
    codec => json
    threads => 2
  }
}

output {
  sink {}
}

Related issues

@andsel andsel self-assigned this Aug 18, 2025
@github-actions
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Aug 18, 2025

This pull request does not have a backport label. Could you fix it @andsel? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • If no backport is necessary, please add the backport-skip label

@andsel andsel requested a review from Copilot August 21, 2025 12:57
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements batch metrics collection in Logstash to measure average batch byte size and event count. It introduces a new setting pipeline.batch.metrics with three modes: none (disabled), minimal (1% sampling), and full (every batch).

  • Added new batch metrics collection infrastructure with configurable sampling
  • Introduced memory estimation capabilities for events and data structures
  • Exposed batch statistics through the _node/stats API

Reviewed Changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
qa/integration/specs/monitoring_api_spec.rb Added integration tests for batch metrics with 'full' and 'none' modes
logstash-core/src/test/java/.../MockNamespacedMetric.java Created mock implementation for testing metric collection
logstash-core/src/test/java/.../JrubyMemoryReadClientExtTest.java Added unit tests for batch metrics collection in memory read client
logstash-core/src/main/java/.../MetricKeys.java Added batch-related metric key constants
logstash-core/src/main/java/.../QueueFactoryExt.java Added BatchMetricType enum and queue creation logic
logstash-core/src/main/java/.../QueueReadClientBase.java Implemented batch metrics collection and memory estimation
logstash-core/src/main/java/.../Event.java Added memory estimation method for events
logstash-core/src/main/java/.../ConvertedMap.java Implemented memory estimation for data structures
config/logstash.yml Added pipeline.batch.metrics configuration option

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@andsel
Copy link
Contributor Author

andsel commented Aug 21, 2025

@andsel andsel marked this pull request as ready for review August 21, 2025 14:13
private QueueFactoryExt.BatchMetricType batchMetricType;

@JRubyMethod(optional = 8)
@JRubyMethod(optional = 9)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of work in this PR to wire the metric type to the queue, but this code will be refactored in #18121, which changes to a builder pattern, so it may be worth coordinating with @yaauie to avoid conflicts and duplicate work.

@andsel andsel force-pushed the feature/average_metric_byte_size_event_count branch from 2be7d2c to f9ab1d0 Compare September 10, 2025 08:09
@andsel andsel requested a review from jsvd September 10, 2025 14:29
@andsel andsel force-pushed the feature/average_metric_byte_size_event_count branch from e26c794 to a099423 Compare September 18, 2025 11:31
…mber of matches to compure the average events and byte per batch
…mup all the batch event's byte estimation. Exposed metric 'pipelines.<pipeline id>.batch.byte_size.average.lifetime' containing the average byte size of each batch
…ient can collect batch metrics related to byte size and event count, this commit spread the setting and parameter around doesn't yet implement the feature
…iltered events and not the existing 'events.in'
@andsel andsel force-pushed the feature/average_metric_byte_size_event_count branch from 75e434f to e060b3a Compare September 19, 2025 13:46
@andsel andsel force-pushed the feature/average_metric_byte_size_event_count branch from e060b3a to ea69549 Compare September 19, 2025 13:49
@andsel andsel requested a review from jsvd September 22, 2025 07:55
…Ext to be symmetric of JRubyWrappedAckedQueueExt
…ch mode is minimal, the metric batch.byte_size.average.lifetime could remain unvalued for a while and this generates an silent error in the API layer that corrupt the response.
@elastic-sonarqube
Copy link

@elasticmachine
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @andsel

@andsel andsel requested a review from jsvd September 23, 2025 08:20
Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andsel andsel merged commit d290218 into elastic:main Sep 23, 2025
13 checks passed
v1v pushed a commit that referenced this pull request Oct 21, 2025
Implements average batch event count and byte size metrics. The collection of such metric could be disabled, enabled for each batch or done on a sample of the total batches.

Exposing metric related to average batch byte size and event count let the user discover the average structure of their batches, understanding if the batches are fulfilled and eventually understand how to set `pipeline.batch.size` and `pipeline.batch.delay` so that goal is reached.

- Instantiate metric `pipelines.<pipeline id>.batch.count` to count number of matches to compute the average events and bytes per batch
- Instantiate metric `pipelines.<pipeline id>.batch.total_bytes` to sumup all the batches event's byte estimation. Exposed metric `pipelines.<pipeline id>.batch.byte_size.average.lifetime` containing the average byte size of each batch.
- create new setting `pipeline.batch.metrics.sampling_mode` which could have 3 values: `disabled`, `minimal` and `full`. In this case id `disable` no `batch` metric is exposed in the `_node/stats` API. `minimal` count batches and estimates the size only for 1% of the total while `full` is for every batch. This setting leverages existing Logstash setting infrastructure so that one defined at pipeline level (defined in `pipelines.yml`) takes precedence over the global one (defined in `logstash.yml`).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend the feature flag pipeline.batch.metrics to work both at global level Implement average lifetime long batch's size and document count metric

3 participants