Skip to content

Conversation

@stephanos
Copy link
Contributor

@stephanos stephanos commented Dec 12, 2025

What changed?

Migrated TestWorkflowUpdateSuite away from testify's Suite; enabling parallel test execution.

How it works

  • a test invokes testcore.NewEnv(t) to obtain a new TestEnv
  • TestEnv sets t.Parallel() (intentionally not giving a way to opt out!)
  • TestEnv obtains a test cluster from clusterPool (or blocks if all are in-use right now)
  • env var TEMPORAL_TEST_SHARED_CLUSTERS controls size of the pool
  • if a test relies on APIs like InjectHook, a dedicated cluster is used to prevent overlap
  • env var TEMPORAL_TEST_DEDICATED_CLUSTERS controls number of dedicated clusters

testify suites

Existing test suites are limited by the same dedicated cluster pool to prevent creating too many clusters.

Database connections

SQLite setup for TestEnv-based func tests (ie only TestWorkflowUpdateSuite so far) has been changed to a file-based approach since that supports much better concurrency due to its WAL that an in-memory SQLite database does not support.

Connection limits for other databases were also raised due to connection errors.

Planned follow-ups

  • Migrating the other testify suites should be fairly straight-forward with the use of AI agents.
  • Reduce need for dedicated clusters by leveraging isolated namespace-per-test more.
  • Eliminate all time.Sleeps.
  • Tweak test cluster pool behavior.

Why?

  1. Local speedup: benchmarks show a ~50% speed increase (36.1s → 16.6s) for TestWorkflowUpdateSuite.

  2. Namespace isolation: every test runs in its own namespace. This greatly reduces the risk of (accidental) collisions and also reduces the need to craft unique identifiers such as for task queues and workflow IDs.

  3. Deprecate testify suites: Long-term strategy to remove use of testify suites in functional tests (one reason being their inability to run tests within a suite in parallel).

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential Issues

  1. Logs become less useful since there is more interleaving of tests.
  2. Higher resource consumption: it requires more concurrent connections to databases and shows higher memory consumption (see 3 and 4). This could cause some short-term instability on CI. Note that some other PRs were merged to add mechanics for monitoring memory usage much better; which will help here.
  3. Until all functional tests are converted, there is an imbalance in test cluster creation: migrated tests use the shared pool while current tests create one cluster each. Especially given the fact that some tests don't allow for test cluster sharing as they use non-parallelizable actions such as InjectHook or dynamic config overrides. With some more effort the number of these can be reduced.
  4. Setup of test clusters was designed around the idea of short-lived clusters, one per suite. But when re-using them for longer, some of the assumptions don't hold anymore and increase memory usage. There's a band aid in place to limit how often a test cluster can be used before it's torn down. A long-term solution requires some design changes to how test clusters are started/used/torn down.
  5. If there are certain cross-namespace issues or bugs that affect multiple tests, it might be harder to identify the root cause now. However; the existing test re-runs should at least mitigate these short-term.

@stephanos stephanos force-pushed the faults-2 branch 7 times, most recently from 84b32e2 to b2bf3e7 Compare December 12, 2025 19:02
@stephanos stephanos changed the title Parallel Tests [experiment] Parallel Workflow Update Tests [experiment] Dec 12, 2025
@stephanos stephanos force-pushed the faults-2 branch 11 times, most recently from 51af423 to a371795 Compare December 19, 2025 18:31
@stephanos stephanos force-pushed the faults-2 branch 10 times, most recently from 3dc8bc5 to f4be4bc Compare January 12, 2026 18:42
@stephanos stephanos force-pushed the faults-2 branch 19 times, most recently from 25056f8 to e5bb0d4 Compare January 19, 2026 22:10
Comment on lines -535 to -536
TEMPORAL_TEST_OTEL_OUTPUT: ${{ github.workspace }}/.testoutput
TEMPORAL_OTEL_DEBUG: true
Copy link
Contributor Author

@stephanos stephanos Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed OTEL tracing for now as its design caused increased memory usage (details: the OTEL exporter is scoped to the lifetime of the cluster instead of a namespace which would be more efficient here).

Comment on lines +123 to +124
"busy_timeout": "30000",
"journal_mode": "wal",
"synchronous": "normal",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended settings I found online.

@stephanos stephanos force-pushed the faults-2 branch 2 times, most recently from f8c3c8c to 647f077 Compare January 20, 2026 00:44
testSQLiteSchemaDir = "schema/sqlite/v3" // specify if mode is not "memory"
)

// GetTestClusterOption returns test options for the given store type and driver.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little refactoring to make the persistence setup slightly more ergonomic.

"go.temporal.io/server/tests/testcore"
)

type testEnv interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superseded by testcore.Env.


runID := mustStartWorkflow(s, tv)
func TestWorkflowUpdateSuite(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here were kept to a minimum; helpers and tests were migrated to regular functions. The nested nature of testify suites was preserved, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant