-
Notifications
You must be signed in to change notification settings - Fork 80
[confgenerator] Add file offset storage to Otel Logging receivers.
#2166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
dfb9619 to
6bae68d
Compare
| // https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/release/v0.136.x/receiver/windowseventlogreceiver | ||
| receiver_config := map[string]any{ | ||
| "channel": c, | ||
| "start_at": "beginning", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to figure out how start_at interacts with storage but the upstream docs are unclear. I assumed there would be an option for start_at that would clearly mean "pick up from where we left off according to the stored offset, rather than starting at the beginning or end", but beginning and end are the only two options.
How can I be confident that beginning is actually going to pick up from the stored file offset on collector restart rather than the beginning of the file? Can we add an integration test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed there would be an option for start_at that would clearly mean "pick up from where we left off according to the stored offset, rather than starting at the beginning or end", but beginning and end are the only two options.
Yeah, the start_at is not descriptive. I also assumed "begging + storage" meant "start at the beginning if there is no bookmark".
One option is to do a clarification doc PR in the upstream docs for this.
How can I be confident that beginning is actually going to pick up from the stored file offset on collector restart rather than the beginning of the file? Can we add an integration test for this?
Yeah, we could do an integration test for this since transformation test are not meant to test "restarts".
- How detailed do you think should it be to give us confidence ?
- Should it test only
filesreceiver or alsosystemd,windowseventlog?
We could :
- Send 5 logs from file.
- Wait 2 minutes and restarts.
- Look for duplicate logs in the past 2 mins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created the TestLogCursor integration test to verify this the cursor (bookmark) is preserved after restart.
I ran this test locally to verify that it indeed fails with an Otel Logging version without the filestorage extension
=== NAME TestLogCursor/debian-cloud:debian-11/default
agents.go:1010: Test logs: /tmp/401636222/TestLogCursor_debian-cloud:debian-11_default
agents.go:1010: Instance Log: https://console.cloud.google.com/logs/viewer?resource=gce_instance%2Finstance_id%2F5331085291174606091&project=fcovalente-dev
=== NAME TestLogCursor/debian-cloud:debian-11/otel_logging
agents.go:1010: Instance Log: https://console.cloud.google.com/logs/viewer?resource=gce_instance%2Finstance_id%2F6330475123837351179&project=fcovalente-dev
main_test.go:5829: AssertLogMissing(log="jsonPayload.message=\"line #2\""): <nil> failed: unexpectedly found data for log
main_test.go:5829: AssertLogMissing(log="jsonPayload.message=\"line #1\""): <nil> failed: unexpectedly found data for log
--- FAIL: TestLogCursor (0.00s)
--- FAIL: TestLogCursor/debian-cloud:debian-11 (0.00s)
--- PASS: TestLogCursor/debian-cloud:debian-11/default (273.07s)
--- FAIL: TestLogCursor/debian-cloud:debian-11/otel_logging (310.43s)
|
Any thoughts on |
bookmarking to Otel Logging receivers.file offset storage to Otel Logging receivers.
a9b9bcd to
3805ae5
Compare
|
|
||
| // We should only observe the new logs written after restart. | ||
| addQueryFuncToWaitGroup(func() error { | ||
| return gce.AssertLogMissing(ctx, logger, vm, "files_1", 2*time.Minute, `jsonPayload.message="line #1"`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels overly complicated. All we need to do is verify that one log line is ingested exactly once across an agent restart. Why not just:
- Write one log line
- Restart the agent
- Wait a reasonable amount of time to have ~100% confidence that all lines have been ingested
- Check that the line was only ingested once
- And maaaybe write + verify a second line to rule out that the agent didn't crash or fail to read the file after restart
AFAICT that would also achieve the test's goal while simplifying away the intermediate sleeps, the wait groups, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thats a simpler test, though i avoided doing a "Only one log line count assertion" due to how WaitForLog is implemented. It only returns found or not found.
We would need to write a WaitForLogCount or some other refactoring of WaitForLog or hasMatchingLog 1 to implement this. I'll draft the refactoring and see if its worth it.
Description
Configuring the
file_storageextension to be used asstoragein Otel Logging receivers.Some details :
systemd,filesandwindows_event_log./var/lib/google-cloud-ops-agent/opentelemetry-collector\file_storage/C:\ProgramData\Google\Cloud Operations\Ops Agent\run\file_storageRelated issue
b/469432672
How has this been tested?
Checklist: