-
Notifications
You must be signed in to change notification settings - Fork 5k
[azure monitor] Address wildcard metrics names timegrain issue #46145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[azure monitor] Address wildcard metrics names timegrain issue #46145
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
@zmoog how do the backport tags look here? |
We can use |
I see 6 API calls for both the |
This pull request is now in conflicts. Could you fix it? 🙏
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @joecompute for putting this together. Excellent work!
@Mergifyio backport 8.18 8.19 9.0 9.1 |
✅ Backports have been created
|
…l pointer deref (#46145) (#43725) * Grab first metric from available timegrains and group by it * Test for new metric grouping * Add debug logging for metric definition count * Log and skip if response interval is empty/nil * Add tests for configured and non-configured timegrains * Update EnableBatchApi documentation * Add to changelog * Rename metric to metricConfig * Use composite key w timegrain for metric groups * Add check for configured timegrain compatibility * Fix redundant wording in err * Add test for invalid configured timegrain * Handle mismatched request/response timegrain, update msg (cherry picked from commit 5f1ef61)
…l pointer deref (#46145) (#43725) * Grab first metric from available timegrains and group by it * Test for new metric grouping * Add debug logging for metric definition count * Log and skip if response interval is empty/nil * Add tests for configured and non-configured timegrains * Update EnableBatchApi documentation * Add to changelog * Rename metric to metricConfig * Use composite key w timegrain for metric groups * Add check for configured timegrain compatibility * Fix redundant wording in err * Add test for invalid configured timegrain * Handle mismatched request/response timegrain, update msg (cherry picked from commit 5f1ef61)
…l pointer deref (#46145) (#43725) * Grab first metric from available timegrains and group by it * Test for new metric grouping * Add debug logging for metric definition count * Log and skip if response interval is empty/nil * Add tests for configured and non-configured timegrains * Update EnableBatchApi documentation * Add to changelog * Rename metric to metricConfig * Use composite key w timegrain for metric groups * Add check for configured timegrain compatibility * Fix redundant wording in err * Add test for invalid configured timegrain * Handle mismatched request/response timegrain, update msg (cherry picked from commit 5f1ef61)
…l pointer deref (#46145) (#43725) * Grab first metric from available timegrains and group by it * Test for new metric grouping * Add debug logging for metric definition count * Log and skip if response interval is empty/nil * Add tests for configured and non-configured timegrains * Update EnableBatchApi documentation * Add to changelog * Rename metric to metricConfig * Use composite key w timegrain for metric groups * Add check for configured timegrain compatibility * Fix redundant wording in err * Add test for invalid configured timegrain * Handle mismatched request/response timegrain, update msg (cherry picked from commit 5f1ef61)
Proposed commit message
Main Bug Addressed: Wildcard Search Bug
Issue: #43885
Description: In the buggy scenario, a wildcard search for metrics is provided without a timegrain. A timegrain is the Azure terminology for aggregation period. This caused metrics to be pulled with an incompatible timegrain. In the buggy scenario, we incorrectly use the last leveraged timegrain to pull metric data again.
Fix: In this fix, we first grab the smallest available timegrain from the metric availabilities from the Azure API. These timegrains appear to be ordered, ascending, so we use the first one to assign the metric to a group. We then have groups of compatible metrics associated with this timegrain to prepare for the next step. This fix applies to both
Minor Side Bug Addressed: Nil Pointer Dereference
Issue: #43725
Description: In line
beats/x-pack/metricbeat/module/azure/monitor_service.go
Line 322 in 67e847c
we are dereferencing the resp.Interval pointer to get the interval from the api response.
Fix: Check if the interval is not nil and not empty before continuing to process this data. If it is nil or empty, reject the data as we do when the API call errors. When this happens, we can assume the data is bad. This is because we have also handled the wildcard issue in this PR, so this API error edge case in code should not be hit unless the API is returning bad data.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
None - this only fixes bugs
Author's Checklist
How to test this PR locally
Testing wildcard metrics bug
Unit tests have also been added by this PR, which are automatically run by CI. To test beyond this manually, see below:
Set up the scenario/infrastructure as described in the parent issue: #43885
To test/verify that the 400 error is gone:
Check out this branch and set a breakpoint here before running the scenario in the debugger. Observe that
To test/verify that the number of metric definitions is unaffected:
TestMapMetric
, so the below helper log is just an extra piece of verificationmain
and see that the number of metric definitions is 73 in both cases. Therefore, the number of metric definitions is unaffected.unique metric definition count
.Testing minor nil pointer bug
To confirm that we handle the situation with a nil pointer, one can set up a debugger and set a breakpoint, then force the

resp.Interval
to nil at the first breakpoint in the screenshot. However, as noted in the comments in this code, this should not happen because we have handled the wildcard timegrain config scenario.Related issues
metrics.name: [*]
) issues #43885