-
Notifications
You must be signed in to change notification settings - Fork 5k
Fix flaky test TestFilestreamDelete #47828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+10
−2
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The two log messages are logged from different goroutines that race: - "Closing reader of filestream" is logged from a background goroutine spawned by ctxtool.WithFunc when streamCancel() triggers cancellation - "Stopped harvester for file" is logged from the main harvester goroutine via a defer in harvester.go When streamCancel() executes, it closes a channel that wakes the background goroutine to log "Closing reader", while the main goroutine continues and logs "Stopped harvester". These two goroutines race to write to the log file, making the order non-deterministic. The original test used sequential WaitLogsContains calls which track file offset - when messages appeared in the "wrong" order, the first check would advance the offset past both messages, causing the second to fail. Changed to WaitLogsContainsAnyOrder which checks for both messages without relying on order. Closes elastic#47784
Contributor
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Contributor
🤖 GitHub commentsJust comment with:
|
belimawr
approved these changes
Dec 3, 2025
Contributor
|
@Mergifyio backport 8.19 9.1 9.2 |
Contributor
✅ Backports have been created
|
mergify bot
pushed a commit
that referenced
this pull request
Dec 3, 2025
## Proposed commit message The two log messages are logged from different goroutines that race: - "Closing reader of filestream" is logged from a background goroutine spawned by ctxtool.WithFunc when streamCancel() triggers cancellation - "Stopped harvester for file" is logged from the main harvester goroutine via a defer in harvester.go When streamCancel() executes, it closes a channel that wakes the background goroutine to log "Closing reader", while the main goroutine continues and logs "Stopped harvester". These two goroutines race to write to the log file, making the order non-deterministic. The original test used sequential WaitLogsContains calls which track file offset - when messages appeared in the "wrong" order, the first check would advance the offset past both messages, causing the second to fail. Changed to WaitLogsContainsAnyOrder which checks for both messages without relying on order. Closes #47784 ## How to test this PR locally 1. Start the integration test containers: ```bash cd filebeat && mage docker:composeUp ``` 2. Build the test binary: ```bash mage buildSystemTestBinary ``` 3. Run the specific test with FIPS mode (reproduces original failure without fix): ```bash GODEBUG=fips140=only \ ES_HOST=localhost ES_USER=beats ES_PASS=testing \ ES_SUPERUSER_USER=admin ES_SUPERUSER_PASS=testing \ go test -v -failfast -tags "integration,requirefips" \ -run "TestFilestreamDelete/Inactive_resource_not_finished_and_data_added_during_grace_period" \ ./tests/integration/ -count=20 ``` 4. Clean up: ```bash mage docker:composeDown ``` (cherry picked from commit 5ce630a) # Conflicts: # filebeat/tests/integration/filestream_delete_test.go
mergify bot
pushed a commit
that referenced
this pull request
Dec 3, 2025
## Proposed commit message The two log messages are logged from different goroutines that race: - "Closing reader of filestream" is logged from a background goroutine spawned by ctxtool.WithFunc when streamCancel() triggers cancellation - "Stopped harvester for file" is logged from the main harvester goroutine via a defer in harvester.go When streamCancel() executes, it closes a channel that wakes the background goroutine to log "Closing reader", while the main goroutine continues and logs "Stopped harvester". These two goroutines race to write to the log file, making the order non-deterministic. The original test used sequential WaitLogsContains calls which track file offset - when messages appeared in the "wrong" order, the first check would advance the offset past both messages, causing the second to fail. Changed to WaitLogsContainsAnyOrder which checks for both messages without relying on order. Closes #47784 ## How to test this PR locally 1. Start the integration test containers: ```bash cd filebeat && mage docker:composeUp ``` 2. Build the test binary: ```bash mage buildSystemTestBinary ``` 3. Run the specific test with FIPS mode (reproduces original failure without fix): ```bash GODEBUG=fips140=only \ ES_HOST=localhost ES_USER=beats ES_PASS=testing \ ES_SUPERUSER_USER=admin ES_SUPERUSER_PASS=testing \ go test -v -failfast -tags "integration,requirefips" \ -run "TestFilestreamDelete/Inactive_resource_not_finished_and_data_added_during_grace_period" \ ./tests/integration/ -count=20 ``` 4. Clean up: ```bash mage docker:composeDown ``` (cherry picked from commit 5ce630a) # Conflicts: # filebeat/tests/integration/filestream_delete_test.go
mergify bot
pushed a commit
that referenced
this pull request
Dec 3, 2025
## Proposed commit message The two log messages are logged from different goroutines that race: - "Closing reader of filestream" is logged from a background goroutine spawned by ctxtool.WithFunc when streamCancel() triggers cancellation - "Stopped harvester for file" is logged from the main harvester goroutine via a defer in harvester.go When streamCancel() executes, it closes a channel that wakes the background goroutine to log "Closing reader", while the main goroutine continues and logs "Stopped harvester". These two goroutines race to write to the log file, making the order non-deterministic. The original test used sequential WaitLogsContains calls which track file offset - when messages appeared in the "wrong" order, the first check would advance the offset past both messages, causing the second to fail. Changed to WaitLogsContainsAnyOrder which checks for both messages without relying on order. Closes #47784 ## How to test this PR locally 1. Start the integration test containers: ```bash cd filebeat && mage docker:composeUp ``` 2. Build the test binary: ```bash mage buildSystemTestBinary ``` 3. Run the specific test with FIPS mode (reproduces original failure without fix): ```bash GODEBUG=fips140=only \ ES_HOST=localhost ES_USER=beats ES_PASS=testing \ ES_SUPERUSER_USER=admin ES_SUPERUSER_PASS=testing \ go test -v -failfast -tags "integration,requirefips" \ -run "TestFilestreamDelete/Inactive_resource_not_finished_and_data_added_during_grace_period" \ ./tests/integration/ -count=20 ``` 4. Clean up: ```bash mage docker:composeDown ``` (cherry picked from commit 5ce630a)
This was referenced Dec 3, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport-active-all
Automated backport with mergify to all the active branches
bug
flaky-test
Unstable or unreliable test cases.
skip-changelog
Team:Elastic-Agent-Data-Plane
Label for the Agent Data Plane team
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed commit message
The two log messages are logged from different goroutines that race:
When streamCancel() executes, it closes a channel that wakes the background goroutine to log "Closing reader", while the main goroutine continues and logs "Stopped harvester". These two goroutines race to write to the log file, making the order non-deterministic.
The original test used sequential WaitLogsContains calls which track file offset - when messages appeared in the "wrong" order, the first check would advance the offset past both messages, causing the second to fail.
Changed to WaitLogsContainsAnyOrder which checks for both messages without relying on order.
Closes #47784
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesstresstest.shscript to run them under stress conditions and race detector to verify their stability.I have added an entry in./changelog/fragmentsusing the changelog tool.How to test this PR locally
Start the integration test containers:
Build the test binary:
Run the specific test with FIPS mode (reproduces original failure without fix):
Clean up:
Related issues