Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(otel): Exporter creating Monitored Resource with task_id for Cloud Run #14923

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DouglasHeriot
Copy link

@DouglasHeriot DouglasHeriot commented Jan 10, 2025

Fixes #14925

When inside a Cloud Run environment, the MonitoredResource in a CreateTimeSeriesRequest to the Cloud Monitoring API does not include the necessary fields for the generic_task resource type, and is rejected.

Also updating the GenericNode mapping to match Golang.


This change is Reviewable

@DouglasHeriot DouglasHeriot changed the title Fix OTel Exporter creating Monitored Resource for Cloud Run. fix(otel): Exporter creating Monitored Resource with task_id for Cloud Run Jan 10, 2025
Fixes googleapis#14925

When inside a Cloud Run environment, the `MonitoredResource` in a `CreateTimeSeriesRequest` to the Cloud Monitoring API does not include the necessary fields for the `generic_task` resource type, and is rejected.

Should follow the well-tested Golang implementation where the `faas.instance` OTel Resource Attribute is mapped to `MonitoredResource` `task_id`. As the `service.namespace` OTel Resource Attribute is not set by the Resource Detector from within Cloud Run, it should be mapped as an empty string, rather than being left absent.

https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/8da0f42dab085c916987891419461d583a2aa96e/internal/resourcemapping/resourcemapping.go#L153
@DouglasHeriot DouglasHeriot marked this pull request as ready for review January 10, 2025 08:13
@DouglasHeriot DouglasHeriot requested a review from a team as a code owner January 10, 2025 08:13
@scotthart
Copy link
Member

/gcbrun

Copy link
Member

@dbolduc dbolduc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

The code looks good, although I offered an optional refactor. Unit tests for the new code paths would be nice. Either way, we need to fix the tests that are now broken.

You can run the unit test with:

bazelisk test --test_output=all //google/cloud/opentelemetry:internal_monitored_resource_test

Comment on lines 75 to +78
} else if (kv.second.fallback) {
mr.labels[kv.first] = *kv.second.fallback;
} else {
mr.labels[kv.first] = "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems like we always fallback. Sometimes to a value like "global", but at least to "".

optional nit: It seems slightly cleaner to make fallback a std::string:

absl::optional<std::string> fallback = absl::nullopt;

and then have this code just be:

} else {
  mr.labels[kv.first] = kv.second.fallback;
}

(If this doesn't work for whatever reason, just let me know, and I can look into cleaning it up in a future PR)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem cleaner, I’ll change the fallback to default to empty string.

@@ -174,7 +176,7 @@ MonitoredResourceProvider GenericNode() {
{"location",
{{sc::kCloudAvailabilityZone, sc::kCloudRegion}, "global"}},
{"namespace", {{sc::kServiceNamespace}}},
{"node_id", {{sc::kHostId}}},
{"node_id", {{sc::kHostId, sc::kHostName}}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

Can we have a unit test for this?

I would probably change TestCase to something like LocationTestCase, and then introduce a NodeIdTestCase, and loop over:

for (auto l : location_tests) {
  for (auto n : node_tests) {
    // verify l.expected_location and n.expected_node_id
  }
}

Comment on lines +167 to +168
{"job", {{sc::kServiceName, sc::kFaasName}}},
{"task_id", {{sc::kServiceInstanceId, sc::kFaasInstance}}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, good catch and thanks.

Can we have unit tests for these too? Or at least fix the broken tests.

Also, consider leaving a comment in a test case where we fallback to the empty string. Just something that says "Verify that we fallback to the empty string if no matches are found".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I’ll add some tests for these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenTelemetry on Cloud Run does not include task_id in Monitored Resource
3 participants