Skip to content

Conversation

donoghuc
Copy link
Member

@donoghuc donoghuc commented Apr 24, 2025

Release notes

[rn:skip]

What does this PR do?

There appears to be a race condition in tests whereby log messages are not observed with a single interogation of the container logs. This commit adds a helper method to wait for log messages. This should make the tests more resilient when there is a delay in log messages being captured.

Why is it important/What is the impact to the user?

NA

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

How to test this PR locally

➜  logstash git:(more-resilient-log-assertion) ✗ ci/docker_acceptance_tests.sh oss

Logs

Example failure:
https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1776#_

2025-04-23 23:45:53 PDT | Failures:
-- | --
  | 2025-04-23 23:45:53 PDT |  
  | 2025-04-23 23:45:53 PDT | 1) A container running the oss image behaves like it applies settings correctly when setting config.string persists ${CONFIG_STRING} key in logstash.yml, resolves when running and spins up without issue
  | 2025-04-23 23:45:53 PDT | Failure/Error: DEFAULT_FAILURE_NOTIFIER = lambda { \|failure, _opts\| raise failure }
  | 2025-04-23 23:45:53 PDT |  
  | 2025-04-23 23:45:53 PDT | expected true
  | 2025-04-23 23:45:53 PDT | got false
  | 2025-04-23 23:45:53 PDT | Shared Example Group: "it applies settings correctly" called from ./docker/spec/oss/container_spec.rb:8
  | 2025-04-23 23:45:53 PDT | # ./docker/shared_examples/container_options.rb:92:in `block in <main>'
  | 2025-04-23 23:45:53 PDT |  
  | 2025-04-23 23:45:53 PDT | Finished in 8 minutes 24 seconds (files took 2.74 seconds to load)

There appears to be a race condition in tests whereby log messages are not
observed with a single interogation of the container logs. This commit adds a
helper method to wait for log messages. This should make the tests more
resilient when there is a delay in log messages being captured.
@mergify
Copy link
Contributor

mergify bot commented Apr 24, 2025

This pull request does not have a backport label. Could you fix it @donoghuc? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • If no backport is necessary, please add the backport-skip label

@donoghuc donoghuc added the backport-active-all Automated backport with mergify to all the active branches label Apr 24, 2025
@donoghuc
Copy link
Member Author

While debugging this I was printing the logs:

�[2025-04-24T17:48:01,563][INFO ][logstash.outputs.elasticsearch][main] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["https://es:9200"]}
�[2025-04-24T17:48:01,592][INFO ][logstash.outputs.elasticsearch][main] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[https://kimchy:xxxxxx@es:9200/]}}
[2025-04-24T17:48:01,641][INFO ][logstash.outputs.elasticsearch][main] Failed to perform request {:message=>"es: Name or service not known", :exception=>Manticore::ResolutionFailure, :cause=>#<Java::JavaNet::UnknownHostException: es: Name or service not known>}
�[2025-04-24T17:48:01,642][WARN ][logstash.outputs.elasticsearch][main] Attempted to resurrect connection to dead ES instance, but got an error {:url=>"https://kimchy:xxxxxx@es:9200/", :exception=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :message=>"Elasticsearch Unreachable: [https://es:9200/][Manticore::ResolutionFailure] es: Name or service not known"}

As you can see there are some rogue byte sequences in there, but they appear to only be on line endings so they are probably not breaking up the string we are looking for. I looked a bit in to upserve/docker-api#290 and it seems like maybe we could switch methods if we are still having issues.

@donoghuc
Copy link
Member Author

Working locally:

Finished in 5 minutes 33 seconds (files took 0.7862 seconds to load)
44 examples, 0 failures

@donoghuc
Copy link
Member Author

donoghuc commented Apr 24, 2025

Kicked off exhaustive test pipeline on this branch: https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1780

Due to a copy paste error, the wrong reference to container (instance var)
was being used. When test file that did not define this uses the helper
it failed. The helper should use the ref explicitly passed to the method.
@donoghuc
Copy link
Member Author

donoghuc commented Apr 24, 2025

@elastic-sonarqube
Copy link

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

Copy link
Contributor

@kaisecheng kaisecheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your time to keep CI healthy

@donoghuc donoghuc merged commit c31fcfd into elastic:main Apr 29, 2025
9 checks passed
@github-actions
Copy link
Contributor

@Mergifyio backport 8.17 8.18 8.19 9.0

@mergify
Copy link
Contributor

mergify bot commented Apr 29, 2025

mergify bot pushed a commit that referenced this pull request Apr 29, 2025
* Add helper method to wait for log message to be observed

There appears to be a race condition in tests whereby log messages are not
observed with a single interogation of the container logs. This commit adds a
helper method to wait for log messages. This should make the tests more
resilient when there is a delay in log messages being captured.

* Use local scop container ref

Due to a copy paste error, the wrong reference to container (instance var)
was being used. When test file that did not define this uses the helper
it failed. The helper should use the ref explicitly passed to the method.

(cherry picked from commit c31fcfd)
mergify bot pushed a commit that referenced this pull request Apr 29, 2025
* Add helper method to wait for log message to be observed

There appears to be a race condition in tests whereby log messages are not
observed with a single interogation of the container logs. This commit adds a
helper method to wait for log messages. This should make the tests more
resilient when there is a delay in log messages being captured.

* Use local scop container ref

Due to a copy paste error, the wrong reference to container (instance var)
was being used. When test file that did not define this uses the helper
it failed. The helper should use the ref explicitly passed to the method.

(cherry picked from commit c31fcfd)
mergify bot pushed a commit that referenced this pull request Apr 29, 2025
* Add helper method to wait for log message to be observed

There appears to be a race condition in tests whereby log messages are not
observed with a single interogation of the container logs. This commit adds a
helper method to wait for log messages. This should make the tests more
resilient when there is a delay in log messages being captured.

* Use local scop container ref

Due to a copy paste error, the wrong reference to container (instance var)
was being used. When test file that did not define this uses the helper
it failed. The helper should use the ref explicitly passed to the method.

(cherry picked from commit c31fcfd)
mergify bot pushed a commit that referenced this pull request Apr 29, 2025
* Add helper method to wait for log message to be observed

There appears to be a race condition in tests whereby log messages are not
observed with a single interogation of the container logs. This commit adds a
helper method to wait for log messages. This should make the tests more
resilient when there is a delay in log messages being captured.

* Use local scop container ref

Due to a copy paste error, the wrong reference to container (instance var)
was being used. When test file that did not define this uses the helper
it failed. The helper should use the ref explicitly passed to the method.

(cherry picked from commit c31fcfd)
donoghuc added a commit that referenced this pull request Apr 29, 2025
…7605)

* Add helper method to wait for log message to be observed

There appears to be a race condition in tests whereby log messages are not
observed with a single interogation of the container logs. This commit adds a
helper method to wait for log messages. This should make the tests more
resilient when there is a delay in log messages being captured.

* Use local scop container ref

Due to a copy paste error, the wrong reference to container (instance var)
was being used. When test file that did not define this uses the helper
it failed. The helper should use the ref explicitly passed to the method.

(cherry picked from commit c31fcfd)

Co-authored-by: Cas Donoghue <[email protected]>
donoghuc added a commit that referenced this pull request Apr 29, 2025
…7606)

* Add helper method to wait for log message to be observed

There appears to be a race condition in tests whereby log messages are not
observed with a single interogation of the container logs. This commit adds a
helper method to wait for log messages. This should make the tests more
resilient when there is a delay in log messages being captured.

* Use local scop container ref

Due to a copy paste error, the wrong reference to container (instance var)
was being used. When test file that did not define this uses the helper
it failed. The helper should use the ref explicitly passed to the method.

(cherry picked from commit c31fcfd)

Co-authored-by: Cas Donoghue <[email protected]>
donoghuc added a commit that referenced this pull request Apr 29, 2025
…7607)

* Add helper method to wait for log message to be observed

There appears to be a race condition in tests whereby log messages are not
observed with a single interogation of the container logs. This commit adds a
helper method to wait for log messages. This should make the tests more
resilient when there is a delay in log messages being captured.

* Use local scop container ref

Due to a copy paste error, the wrong reference to container (instance var)
was being used. When test file that did not define this uses the helper
it failed. The helper should use the ref explicitly passed to the method.

(cherry picked from commit c31fcfd)

Co-authored-by: Cas Donoghue <[email protected]>
donoghuc added a commit that referenced this pull request Apr 29, 2025
…7608)

* Add helper method to wait for log message to be observed

There appears to be a race condition in tests whereby log messages are not
observed with a single interogation of the container logs. This commit adds a
helper method to wait for log messages. This should make the tests more
resilient when there is a delay in log messages being captured.

* Use local scop container ref

Due to a copy paste error, the wrong reference to container (instance var)
was being used. When test file that did not define this uses the helper
it failed. The helper should use the ref explicitly passed to the method.

(cherry picked from commit c31fcfd)

Co-authored-by: Cas Donoghue <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-all Automated backport with mergify to all the active branches

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants