Skip to content

Conversation

thevilledev
Copy link
Contributor

Fix flaky tests by polling Vault's /v1/sys/health endpoint. This ensures the server is ready before tests attempt to connect, preventing a race condition during Vault dev server startup.

Example of a failed run: https://github.com/hashicorp/consul-template/actions/runs/14620430375/job/41018777729?pr=2052

Breakdown of the issue in watch/watch_test.go:

  • main calls newTestVault() to start a Vault dev server using exec.Command("vault", "server", "-dev", ...) and stores the command in testVault.
  • Immediately after cmd.Start(), main creates Vault clients and calls vaultTokenSetup(clients).
  • vaultTokenSetup then attempts to communicate with the Vault API through vc.Sys().EnableAuthWithOptions(...), which fails with the connection refused error as the process is not in listening state yet.

Seems to affect the Enterprise tests more than the others.

You can simulate the test error locally by making the test launch an intermediate script which sleeps indefinitely.

@thevilledev thevilledev requested a review from a team as a code owner April 28, 2025 11:57
@thevilledev thevilledev force-pushed the fix/watcher-vault-startup-wait branch from 7901e87 to 6c43fdb Compare April 28, 2025 12:02
@thevilledev
Copy link
Contributor Author

thevilledev commented Apr 28, 2025

I was able to dig out the following from Vault (Enterprise) stdout:

Error parsing listener configuration.
Error initializing listener of type tcp: listen tcp 127.0.0.1:8200: bind: address already in use

Looks like it needs to use ephemeral ports like the other tests. Will mark this as a draft for now.

@thevilledev thevilledev marked this pull request as draft April 28, 2025 12:26
@thevilledev thevilledev force-pushed the fix/watcher-vault-startup-wait branch from e42a153 to bdf5fa4 Compare April 28, 2025 13:05
Fix flaky tests by polling Vault's /v1/sys/health endpoint.
This ensures the server is ready before tests attempt to connect,
preventing a race condition during Vault dev server startup.

Signed-off-by: Ville Vesilehto <[email protected]>
@thevilledev thevilledev force-pushed the fix/watcher-vault-startup-wait branch from e6d0114 to d323dc2 Compare April 28, 2025 17:57
@thevilledev thevilledev marked this pull request as ready for review April 28, 2025 17:59
@thevilledev
Copy link
Contributor Author

Now watch package tests pass. However, all Consul Enterprise related tests still fail.

failed to start consul server: api unavailable
FAIL	github.com/hashicorp/consul-template/dependency	2.154s

There's very little to debug due to consul output being omitted. My only hunch is that maybe the Consul Enterprise license in CONSUL_LICENSE env variable (passed from a Github secret) has been expired?

Maybe someone from HashiCorp could verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant