-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Component(s)
pkg/stanza/fileconsumer
What happened?
Description
When using a copy-truncate log rotation method, the filelog receiver supposedly stops reading new log lines from the active file after a rotation.
This happens intermittently and is not always reproducible. It depends on the precise timing between the rotation operation, file truncation, and the collector’s polling interval.
After rotation, debug logs show entries such as:
Reading lost file path=/var/log/app/app.logBut no new log entries from /var/log/app/app.log appear afterward.
The collector continues polling, but does not seem to reattach to the new file created by logrotate.
Steps to Reproduce
- Configure the receiver to monitor only the base log file:
receivers:
otlpjsonfile:
include: [/var/log/app/app.log]
start_at: beginning
poll_interval: 500ms-
Continuously write log lines to /var/log/app/app.log
-
Use a logrotate configuration file with the
copytruncateoption, for example (/etc/logrotate.d/app):
/var/log/app/app.log {
rotate 5
size 1M
copytruncate
missingok
notifempty
compress
delaycompress
}- Trigger a rotation manually:
logrotate -f /etc/logrotate.d/app- Continue writing logs to
/var/log/app/app.logafter the rotation. - Observe that in some runs, the collector stops collecting new lines from the active log file. In other runs, the collector resumes correctly, depending on timing and poll overlap.
Expected Result
The collector should detect the rotation, complete reading any remaining lines from the old file, and then immediately resume reading from the new /var/log/app/app.log.
New log lines written after the rotation should be collected seamlessly.
Actual Result
Occasionally, after a rotation with copytruncate, the collector supposedly stops collecting new data.
The debug logs show:
Reading lost file path=/var/log/app/app.logBut the collector does not emit new log entries from the file until it is restarted or reconfigured.
This behavior is not consistent and appears to depend on the timing between rotation and polling.
Collector version
v0.137.0
Environment information
Environment
OS: Ubuntu 22.04
OpenTelemetry Collector configuration
receivers:
otlpjsonfile:
include: [/var/log/app/app.log]
exporters:
debug:
verbosity: detailed
extensions: {}
service:
extensions: []
pipelines:
logs:
receivers: [otlpjsonfile]
processors: []
exporters: [debug]Log output
{"level":"debug","ts":"2025-10-19T17:59:57.814Z","caller":"fileconsumer/file.go:157","msg":"Consuming files","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","paths":["/var/log/app/app.log"]}
{"level":"debug","ts":"2025-10-19T18:00:02.814Z","caller":"fileconsumer/file_other.go:57","msg":"Reading lost file","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","path":"/var/log/app/app.log"}
{"level":"debug","ts":"2025-10-19T18:00:04.614Z","caller":"fileconsumer/file.go:157","msg":"Consuming files","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","paths":["/var/log/app/app.log"]}
{"level":"debug","ts":"2025-10-19T18:00:06.014Z","caller":"fileconsumer/file.go:125","msg":"matched files","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","paths":["/var/log/app/app.log"]}
{"level":"debug","ts":"2025-10-19T18:00:06.215Z","caller":"fileconsumer/file.go:125","msg":"matched files","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","paths":["/var/log/app/app.log"]}
{"level":"debug","ts":"2025-10-19T18:00:06.414Z","caller":"fileconsumer/file.go:125","msg":"matched files","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","paths":["/var/log/app/app.log"]}
{"level":"debug","ts":"2025-10-19T18:00:07.414Z","caller":"fileconsumer/file.go:157","msg":"Consuming files","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","paths":["/var/log/app/app.log"]}
{"level":"debug","ts":"2025-10-19T18:00:07.614Z","caller":"fileconsumer/file.go:157","msg":"Consuming files","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","paths":["/var/log/app/app.log"]}
{"level":"debug","ts":"2025-10-19T18:00:07.814Z","caller":"fileconsumer/file.go:157","msg":"Consuming files","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","paths":["/var/log/app/app.log"]}
{"level":"debug","ts":"2025-10-19T18:00:16.013Z","caller":"fileconsumer/file.go:125","msg":"matched files","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","paths":["/var/log/app/app.log"]}
{"level":"debug","ts":"2025-10-19T18:00:16.213Z","caller":"fileconsumer/file.go:125","msg":"matched files","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","paths":["/var/log/app/app.log"]}
{"level":"debug","ts":"2025-10-19T18:00:16.413Z","caller":"fileconsumer/file.go:125","msg":"matched files","resource":{"service.instance.id":"1f208ef0-43ad-40b1-843b-0450afa19318","service.name":"otelcol-contrib","service.version":"0.137.0"},"otelcol.component.id":"otlpjsonfile/app","otelcol.component.kind":"receiver","otelcol.signal":"logs","component":"fileconsumer","paths":["/var/log/app/app.log"]}Additional context
Based on analysis and local testing, this issue, IMO, can occur because of how the collector manages fingerprints and file tracking during rotation events where both the pre-rotation and post-rotation files share the same fingerprint or when the new file is empty during the poll.
- Duplicate fingerprint handling in makeReaders()
opentelemetry-collector-contrib/pkg/stanza/fileconsumer/file.go
Lines 215 to 223 in 4e7fcea
if r := m.tracker.GetCurrentFile(fp); r != nil { m.set.Logger.Debug("Skipping duplicate file", zap.String("path", file.Name())) // re-add the reader as Match() removes duplicates m.tracker.Add(r) if err := file.Close(); err != nil { m.set.Logger.Debug("problem closing file", zap.Error(err)) } continue }
I think here if both files have the same fingerprint (as temporarily happens with copy-truncate), the new file can be skipped for that poll, leaving the collector attached to the old inode.
- File matching order in readLostFiles()
opentelemetry-collector-contrib/pkg/stanza/fileconsumer/file_other.go
Lines 31 to 33 in 4e7fcea
if newReader.Fingerprint.StartsWith(oldReader.Fingerprint) { continue OUTER }
The fingerprint check happens before verifying the filename, which may prevent proper detection of the rotated file.
I think those conditions together can cause the collector to remain attached to the old (truncated) file and fail to reattach to the new active file.
Link to possible related previous issues/PRs:
- [pkg/stanza/fileconsumer] Compare the oldReader.offset and newReader.file.size when creating newReader #19428
- [fix] [pkg/stanza/fileconsumer] do not update fingerprint size when less data has been read #22937
- Fix Rotation file data lost in specified state #19766
- make new reader during the rotation and exclude the same file each cycle #19767
I'm ok with contributing to fix this, but I want first thoughts to ensure I'm thinking in the right direction 🙏
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.