-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Measure average batch byte size and event count #18000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measure average batch byte size and event count #18000
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
This pull request does not have a backport label. Could you fix it @andsel? 🙏
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements batch metrics collection in Logstash to measure average batch byte size and event count. It introduces a new setting pipeline.batch.metrics
with three modes: none
(disabled), minimal
(1% sampling), and full
(every batch).
- Added new batch metrics collection infrastructure with configurable sampling
- Introduced memory estimation capabilities for events and data structures
- Exposed batch statistics through the
_node/stats
API
Reviewed Changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
qa/integration/specs/monitoring_api_spec.rb | Added integration tests for batch metrics with 'full' and 'none' modes |
logstash-core/src/test/java/.../MockNamespacedMetric.java | Created mock implementation for testing metric collection |
logstash-core/src/test/java/.../JrubyMemoryReadClientExtTest.java | Added unit tests for batch metrics collection in memory read client |
logstash-core/src/main/java/.../MetricKeys.java | Added batch-related metric key constants |
logstash-core/src/main/java/.../QueueFactoryExt.java | Added BatchMetricType enum and queue creation logic |
logstash-core/src/main/java/.../QueueReadClientBase.java | Implemented batch metrics collection and memory estimation |
logstash-core/src/main/java/.../Event.java | Added memory estimation method for events |
logstash-core/src/main/java/.../ConvertedMap.java | Implemented memory estimation for data structures |
config/logstash.yml | Added pipeline.batch.metrics configuration option |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
logstash-core/src/main/java/org/logstash/execution/QueueReadClientBase.java
Outdated
Show resolved
Hide resolved
logstash-core/src/main/java/org/logstash/execution/QueueReadClientBase.java
Outdated
Show resolved
Hide resolved
private QueueFactoryExt.BatchMetricType batchMetricType; | ||
|
||
@JRubyMethod(optional = 8) | ||
@JRubyMethod(optional = 9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2be7d2c
to
f9ab1d0
Compare
logstash-core/src/main/java/org/logstash/execution/QueueReadClientBatchMetrics.java
Outdated
Show resolved
Hide resolved
e26c794
to
a099423
Compare
…mber of matches to compure the average events and byte per batch
…l in events divided by number of batches
…mup all the batch event's byte estimation. Exposed metric 'pipelines.<pipeline id>.batch.byte_size.average.lifetime' containing the average byte size of each batch
…teMemory can work with
…d batch subtree metrics
…ient can collect batch metrics related to byte size and event count, this commit spread the setting and parameter around doesn't yet implement the feature
…iltered events and not the existing 'events.in'
… error in batch estimation
…is not meaningfull in such case
75e434f
to
e060b3a
Compare
e060b3a
to
ea69549
Compare
…g into batch metric mode enum
…ic mode and valorize it, not only in the initialize method
logstash-core/src/main/java/org/logstash/ackedqueue/ext/JRubyWrappedAckedQueueExt.java
Outdated
Show resolved
Hide resolved
…Ext to be symmetric of JRubyWrappedAckedQueueExt
…ch mode is minimal, the metric batch.byte_size.average.lifetime could remain unvalued for a while and this generates an silent error in the API layer that corrupt the response.
|
💛 Build succeeded, but was flaky
Failed CI StepsHistory
cc @andsel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Implements average batch event count and byte size metrics. The collection of such metric could be disabled, enabled for each batch or done on a sample of the total batches. Exposing metric related to average batch byte size and event count let the user discover the average structure of their batches, understanding if the batches are fulfilled and eventually understand how to set `pipeline.batch.size` and `pipeline.batch.delay` so that goal is reached. - Instantiate metric `pipelines.<pipeline id>.batch.count` to count number of matches to compute the average events and bytes per batch - Instantiate metric `pipelines.<pipeline id>.batch.total_bytes` to sumup all the batches event's byte estimation. Exposed metric `pipelines.<pipeline id>.batch.byte_size.average.lifetime` containing the average byte size of each batch. - create new setting `pipeline.batch.metrics.sampling_mode` which could have 3 values: `disabled`, `minimal` and `full`. In this case id `disable` no `batch` metric is exposed in the `_node/stats` API. `minimal` count batches and estimates the size only for 1% of the total while `full` is for every batch. This setting leverages existing Logstash setting infrastructure so that one defined at pipeline level (defined in `pipelines.yml`) takes precedence over the global one (defined in `logstash.yml`).
Release notes
Implements average batch event count and byte size metrics. The collection of such metric could be disabled, enabled for each batch or done on a sample of the total batches.
What does this PR do?
pipelines.<pipeline id>.batch.count
to count number of matches to compute the average events and byte per batchpipelines.<pipeline id>.batch.total_bytes
to sumup all the batches event's byte estimation. Exposed metricpipelines.<pipeline id>.batch.byte_size.average.lifetime
containing the average byte size of each batch.pipeline.batch.metrics.sampling_mode
which could have 3 values:disabled
,minimal
andfull
. In this case iddisable
nobatch
metric is exposed in the_node/stats
API.minimal
count batches and estimates the size only for 1% of the total whilefull
is for every batch. This setting leverages existing Logstash setting infrastructure so that one defined at pipeline level (defined inpipelines.yml
) takes precedence over the global one (defined inlogstash.yml
).Why is it important/What is the impact to the user?
Exposing metric related to average batch byte size and event count let the user discover the average structure of their batches, understanding if the batches are fulfilled and eventually understand how to set
pipeline.batch.size
andpipeline.batch.delay
so that goal is reached.Checklist
[ ] I have made corresponding changes to the documentationhandled by Document feature flag and byte size and event count average metrics #17976Author's Checklist
pipeline.batch.metrics.sampling_type
intologstash.yml
tonone
.How to test this PR locally
Edit
pipeline.batch.metrics
inlogstash.yml
setting the three different valuesnone
,minimal
,full
.Launch Logstash and verify the metrics with:
curl http://localhost:9600/_node/stats | jq .pipelines.main.batch
Example pipeline:
Related issues
pipeline.batch.metrics
to work both at global level #17896