[9.2](backport #47247) [Filebeat/Filestream] Fix missing last few lines of a file #47619
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed commit message
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration files./changelog/fragmentsusing the changelog tool.## Disruptive User ImpactAuthor's Checklist
How to test this PR locally
Run the tests
Manual test
Testing this fix manually is possible, but requires you to monitor the
logs and add data to the file being ingested at a specific time.
At a very high level, the steps are:
harvester
If you ran this test without the fix from this PR, after
#4Filestream will not try to start any more harvesters for the file,
effectively missing the last few lines.
The best way to manually test this PR is to have two terminals open,
one running Filebeat and another ready to append data to the file
Filebeat is ingesting.
Create a file with at least 1kb of data and write down its size
flog -n 20 > /tmp/flog.log wc -c /tmp/flog.logStart Filebeat with following config:
filebeat.yml
To make the logs easier to read, you can send the logs to stdout
and pipe them through jq:
Wait for the log entry:
'/tmp/flog.log' is inactiveAdd data to the file
flog -n 2 >> /tmp/flog.logWait for the log entry:
File /tmp/flog.log has been updatedWait for the log entry:
Harvester already runningWait for the log entry:
File is inactive. Closing. Path='/tmp/flog.log'Wait for the log entry:
Stopped harvester for fileWait for the log entry:
Updating previous state because harvester was closed. '/tmp/flog.log': xxx, wherexxxis the original file size.Wait for the log entry:
File /tmp/flog.log has been updatedWait for the log entry:
Starting harvester for fileWait for the log entry:
End of file reached: /tmp/flog.log; Backoff now.Ensure all events have been read:
wc -l output*.ndjson.Related issues
## Use cases## Screenshots## LogsBenchmarks
Go Benchmark
This is likely not very relevant to the final form of this PR, but I ran some benchmarks comparing the different strategies to prevent the race condition when accessing the
offsetandlastTimeReadin the harvester, below are the results and the codefilebeat/input/filestream/filestream_test.go
Benchbuilder
Latest release: v9.2.1
9.2.12m43.075351941s12264.000000175.31629.2.148.46343038s41269.000000183.11839.2.12m47.897040994s11912.000000176.51489.2.14m51.107096736s6870.000000178.5985PR version
9.3.02m41.103916351s12414.000000175.37349.3.047.520195331s42088.000000182.56259.3.02m44.102216849s12188.000000175.83899.3.04m56.598482898s6743.000000179.3721