Skip to content

Conversation

@franciscovalentecastro
Copy link
Contributor

Description

Configuring the file_storage extension to be used as storage in Otel Logging receivers.

Some details :

  • Added to systemd, files and windows_event_log.
  • The following folders will be used to store the bookmarks :
    • Linux : /var/lib/google-cloud-ops-agent/opentelemetry-collector\file_storage/
    • Windows : C:\ProgramData\Google\Cloud Operations\Ops Agent\run\file_storage

Related issue

b/469432672

How has this been tested?

Checklist:

  • Unit tests
    • Unit tests do not apply.
    • Unit tests have been added/modified and passed for this PR.
  • Integration tests
    • Integration tests do not apply.
    • Integration tests have been added/modified and passed for this PR.
  • Documentation
    • This PR introduces no user visible changes.
    • This PR introduces user visible changes and the corresponding documentation change has been made.
  • Minor version bump
    • This PR introduces no new features.
    • This PR introduces new features, and there is a separate PR to bump the minor version since the last release already.
    • This PR bumps the version.

// https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/release/v0.136.x/receiver/windowseventlogreceiver
receiver_config := map[string]any{
"channel": c,
"start_at": "beginning",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure out how start_at interacts with storage but the upstream docs are unclear. I assumed there would be an option for start_at that would clearly mean "pick up from where we left off according to the stored offset, rather than starting at the beginning or end", but beginning and end are the only two options.

How can I be confident that beginning is actually going to pick up from the stored file offset on collector restart rather than the beginning of the file? Can we add an integration test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed there would be an option for start_at that would clearly mean "pick up from where we left off according to the stored offset, rather than starting at the beginning or end", but beginning and end are the only two options.

Yeah, the start_at is not descriptive. I also assumed "begging + storage" meant "start at the beginning if there is no bookmark".

One option is to do a clarification doc PR in the upstream docs for this.

How can I be confident that beginning is actually going to pick up from the stored file offset on collector restart rather than the beginning of the file? Can we add an integration test for this?

Yeah, we could do an integration test for this since transformation test are not meant to test "restarts".

  • How detailed do you think should it be to give us confidence ?
  • Should it test only files receiver or also systemd, windowseventlog ?

We could :

  1. Send 5 logs from file.
  2. Wait 2 minutes and restarts.
  3. Look for duplicate logs in the past 2 mins.

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created the TestLogCursor integration test to verify this the cursor (bookmark) is preserved after restart.

I ran this test locally to verify that it indeed fails with an Otel Logging version without the filestorage extension

=== NAME  TestLogCursor/debian-cloud:debian-11/default
    agents.go:1010: Test logs: /tmp/401636222/TestLogCursor_debian-cloud:debian-11_default
    agents.go:1010: Instance Log: https://console.cloud.google.com/logs/viewer?resource=gce_instance%2Finstance_id%2F5331085291174606091&project=fcovalente-dev
=== NAME  TestLogCursor/debian-cloud:debian-11/otel_logging
    agents.go:1010: Instance Log: https://console.cloud.google.com/logs/viewer?resource=gce_instance%2Finstance_id%2F6330475123837351179&project=fcovalente-dev
    main_test.go:5829: AssertLogMissing(log="jsonPayload.message=\"line #2\""): <nil> failed: unexpectedly found data for log
    main_test.go:5829: AssertLogMissing(log="jsonPayload.message=\"line #1\""): <nil> failed: unexpectedly found data for log
--- FAIL: TestLogCursor (0.00s)
    --- FAIL: TestLogCursor/debian-cloud:debian-11 (0.00s)
        --- PASS: TestLogCursor/debian-cloud:debian-11/default (273.07s)
        --- FAIL: TestLogCursor/debian-cloud:debian-11/otel_logging (310.43s)

@jefferbrecht
Copy link
Member

jefferbrecht commented Jan 12, 2026

Any thoughts on file_storage vs. db_storage? Do you have any concerns about unbounded retention of offsets for scenarios where someone configures a wildcard path?

@franciscovalentecastro franciscovalentecastro changed the title [confgenerator] Add bookmarking to Otel Logging receivers. [confgenerator] Add file offset storage to Otel Logging receivers. Jan 14, 2026

// We should only observe the new logs written after restart.
addQueryFuncToWaitGroup(func() error {
return gce.AssertLogMissing(ctx, logger, vm, "files_1", 2*time.Minute, `jsonPayload.message="line #1"`)
Copy link
Member

@jefferbrecht jefferbrecht Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels overly complicated. All we need to do is verify that one log line is ingested exactly once across an agent restart. Why not just:

  1. Write one log line
  2. Restart the agent
  3. Wait a reasonable amount of time to have ~100% confidence that all lines have been ingested
  4. Check that the line was only ingested once
  5. And maaaybe write + verify a second line to rule out that the agent didn't crash or fail to read the file after restart

AFAICT that would also achieve the test's goal while simplifying away the intermediate sleeps, the wait groups, etc.

Copy link
Contributor Author

@franciscovalentecastro franciscovalentecastro Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thats a simpler test, though i avoided doing a "Only one log line count assertion" due to how WaitForLog is implemented. It only returns found or not found.

We would need to write a WaitForLogCount or some other refactoring of WaitForLog or hasMatchingLog 1 to implement this. I'll draft the refactoring and see if its worth it.

Footnotes

  1. https://github.com/GoogleCloudPlatform/opentelemetry-operations-collector/blob/01c257f80f58a4f1fcd9d88aa4e85736bb1c9f83/integration_test/gce-testing-internal/gce/gce_testing.go#L607-L654

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants