[8.18](backport #46145) [azure monitor] Address wildcard metrics names timegrain issue #46606

mergify · 2025-09-12T18:51:11Z

Proposed commit message

Main Bug Addressed: Wildcard Search Bug

Issue: #43885

Description: In the buggy scenario, a wildcard search for metrics is provided without a timegrain. A timegrain is the Azure terminology for aggregation period. This caused metrics to be pulled with an incompatible timegrain. In the buggy scenario, we incorrectly use the last leveraged timegrain to pull metric data again.

Fix: In this fix, we first grab the smallest available timegrain from the metric availabilities from the Azure API. These timegrains appear to be ordered, ascending, so we use the first one to assign the metric to a group. We then have groups of compatible metrics associated with this timegrain to prepare for the next step. This fix applies to both

the batch metric fetch and
the non-batch metric fetch.

Minor Side Bug Addressed: Nil Pointer Dereference

Issue: #43725

Description: In line

beats/x-pack/metricbeat/module/azure/monitor_service.go

Line 322 in 67e847c

interval = *resp.Interval

we are dereferencing the resp.Interval pointer to get the interval from the api response.

Fix: Check if the interval is not nil and not empty before continuing to process this data. If it is nil or empty, reject the data as we do when the API call errors. When this happens, we can assume the data is bad. This is because we have also handled the wildcard issue in this PR, so this API error edge case in code should not be hit unless the API is returning bad data.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

None - this only fixes bugs

Author's Checklist

How to test this PR locally

Testing wildcard metrics bug

Unit tests have also been added by this PR, which are automatically run by CI. To test beyond this manually, see below:

Set up the scenario/infrastructure as described in the parent issue: #43885

- module: azure  
  metricsets:  
    - monitor  
  enabled: true  
  period: 60s  
  client_id: '${AZURE_CLIENT_ID:""}'
  client_secret: '${AZURE_CLIENT_SECRET:""}'
  tenant_id: '${AZURE_TENANT_ID:""}'
  subscription_id: '${AZURE_SUBSCRIPTION_ID:""}'
  refresh_list_interval: 600s  
  resources:  
  - resource_query: "resourceType eq 'Microsoft.DBforPostgreSQL/flexibleServers'"
    metrics:  
    - name: ["*"]
      namespace: "Microsoft.DBforPostgreSQL/flexibleServers"

To test/verify that the 400 error is gone:

Check out this branch and set a breakpoint here before running the scenario in the debugger. Observe that

the timegrains are no longer empty the first time we hit this line
the program runs without the Azure API 400 error:

To test/verify that the number of metric definitions is unaffected:

Note that this is also covered by an updated unit test of the TestMapMetric, so the below helper log is just an extra piece of verification
One can compare with main and see that the number of metric definitions is 73 in both cases. Therefore, the number of metric definitions is unaffected.
- There is a convenience debug log added in this PR to obtain this number. The log starts with unique metric definition count.

Testing minor nil pointer bug

To confirm that we handle the situation with a nil pointer, one can set up a debugger and set a breakpoint, then force the resp.Interval to nil at the first breakpoint in the screenshot. However, as noted in the comments in this code, this should not happen because we have handled the wildcard timegrain config scenario.

Related issues

This is an automatic backport of pull request #46145 done by [Mergify](https://mergify.com).

…l pointer deref (#46145) (#43725) * Grab first metric from available timegrains and group by it * Test for new metric grouping * Add debug logging for metric definition count * Log and skip if response interval is empty/nil * Add tests for configured and non-configured timegrains * Update EnableBatchApi documentation * Add to changelog * Rename metric to metricConfig * Use composite key w timegrain for metric groups * Add check for configured timegrain compatibility * Fix redundant wording in err * Add test for invalid configured timegrain * Handle mismatched request/response timegrain, update msg (cherry picked from commit 5f1ef61)

elasticmachine · 2025-09-12T18:51:26Z

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

mergify bot requested a review from a team as a code owner September 12, 2025 18:51

mergify bot added the backport label Sep 12, 2025

mergify bot assigned joecompute Sep 12, 2025

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 12, 2025

mergify bot mentioned this pull request Sep 12, 2025

[azure monitor] Address wildcard metrics names timegrain issue #46145

Merged

6 tasks

github-actions bot added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Sep 12, 2025

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[8.18](backport #46145) [azure monitor] Address wildcard metrics names timegrain issue #46606

[8.18](backport #46145) [azure monitor] Address wildcard metrics names timegrain issue #46606

Uh oh!

mergify bot commented Sep 12, 2025

Uh oh!

elasticmachine commented Sep 12, 2025

Uh oh!

Uh oh!

[8.18](backport #46145) [azure monitor] Address wildcard metrics names timegrain issue #46606

Are you sure you want to change the base?

[8.18](backport #46145) [azure monitor] Address wildcard metrics names timegrain issue #46606

Uh oh!

Conversation

mergify bot commented Sep 12, 2025

Proposed commit message

Main Bug Addressed: Wildcard Search Bug

Minor Side Bug Addressed: Nil Pointer Dereference

Checklist

Disruptive User Impact

Author's Checklist

How to test this PR locally

Testing wildcard metrics bug

To test/verify that the 400 error is gone:

To test/verify that the number of metric definitions is unaffected:

Testing minor nil pointer bug

Related issues

Uh oh!

elasticmachine commented Sep 12, 2025

Uh oh!

Uh oh!