Skip to content

Conversation

@omad
Copy link
Member

@omad omad commented Sep 10, 2025

WIP

This PR fixes the old support for spinning up a bunch of postgis docker container for testing against in parallel. It ended up broken when the new postgis 1.9 driver was integrated.

It allows running tests on any machine with docker, by running uv run pytest and to use pytest-xdist to run the tests significantly faster by splitting them across multiple processes by specifying --numprocesses auto (or replace auto with a number).

The cubedash.testutils.database module provides fixtures to:

  • start a new postgis docker container running per test session (1 by default, or more if using xdist --numprocesses)
  • Create a new test ODC database per test module.

xdist has a couple of styles of splitting, which both work for me.

  • --dist loadscope splits the tests up while running all tests from the same test module on the same test worker.
  • --dist load ignore modules and splits the tests more evenly across test workers.

The fixtures and usage of cubedash.testutils.database are such that either option works. On my 2024 MacBook Pro (12 ARM cores), loadscope completes the full test suite in about 5 minutes, and load in about 5 minutes. However load causes way more load and churn, since duplicate test databases will be setup multiple times for the same test module on different workers.

When running with xdist

  • Benchmark tests are disabled
  • The --pdb flag is disabled.
  • The fans on your machine will get noisy

📚 Documentation preview 📚: https://datacube-explorer--860.org.readthedocs.build/en/860/

@omad omad force-pushed the pytest-xdist branch 2 times, most recently from ac8d96b to a5c7ffc Compare September 10, 2025 01:18
@codecov
Copy link

codecov bot commented Sep 10, 2025

Codecov Report

❌ Patch coverage is 40.98361% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.17%. Comparing base (b3302ed) to head (ce4d2d7).

Files with missing lines Patch % Lines
cubedash/testutils/database.py 40.98% 35 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #860      +/-   ##
===========================================
- Coverage    84.52%   84.17%   -0.35%     
===========================================
  Files           35       35              
  Lines         4245     4278      +33     
  Branches       536      537       +1     
===========================================
+ Hits          3588     3601      +13     
- Misses         462      481      +19     
- Partials       195      196       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


- name: Install Dependencies
run: |
uv sync
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing --locked --extra=test.

- name: Run Tests
run: |
uv run pytest -v -x integration_tests/ --numprocesses 4 --dist load --durations 20 --junit-xml pytest_junit.xml --cov=--cov-report=xml -r sx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github's Linux runners have 2 cores, is 4 a good number for these tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that has changed, but a pleasant surprise!

Put $(nproc) instead of 4 then, so it will automatically adjust with the number of cores in the runners in the future.



GET_DB_FROM_ENV = "get-the-db-from-the-environment-variable"
POSTGIS_IMAGE = "postgis/postgis:16-3.5"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit afraid of having many places to update for the same piece of information because they tend to get out of sync, can this be read from docker-compose.yaml instead so things can't go out of sync?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

I'll find somewhere better for this, an environment variable, or a pytest argument, or reading from a config, or something.

@pjonsson
Copy link
Contributor

Having 4 CI jobs for testing was my workaround for the tests taking 30 minutes to run, and often failing intermittently by the closed cursor. You have fixed the intermittent failures though, and it would be much better to use one job that runs things in parallel.

Starting up a bunch of database containers seems like a quite heavy-handed way that might consume quite a lot of memory. Would it be difficult to instead have a single database container and run the tests against multiple databases inside that container instead?

"Initialising ODC Schema for testing", odc_db=odc_db, cfg_db_url=cfg_env.db_url
)
index = index_connect(cfg_env, validate_connection=False)
# With permissions causes trouble when run multiple times on the same database
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what problem you're seeing, but could you be hitting opendatacube/datacube-core#1959?

with conn.cursor() as cur:
log.info("Creating database", db_name=test_database_name)
cur.execute(
sql.SQL("CREATE DATABASE {db_name}").format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a dinner break since my earlier comments and I still love the thought of faster testing, but this feels like a big chunk of dynamic stuff that will be really hard to inspect/reproduce when things go mysteriously wrong in the CI.

We can easily create $(nproc) test databases when we spin up the current database container (just add a for loop loop around the create database part of the create_db.sh script in commit e6e823b), isn't there some place in conftest.py (or similar) where we can patch in the worker number into the database-part of the engine connection string?

@omad
Copy link
Member Author

omad commented Sep 16, 2025

Having 4 CI jobs for testing was my workaround for the tests taking 30 minutes to run, and often failing intermittently by the closed cursor. You have fixed the intermittent failures though, and it would be much better to use one job that runs things in parallel.

As context, my primary motivation was to run the tests faster locally, not just in GHA. It would be really nice if we can use the same approach everywhere.

Starting up a bunch of database containers seems like a quite heavy-handed way that might consume quite a lot of memory. Would it be difficult to instead have a single database container and run the tests against multiple databases inside that container instead?

That would be nicer, I tried, but, it was unexpectedly difficult with the way pytest-xdist works. The test runners are isolated and don't communicate with each other, and the controller doesn't run the pytest collection and so doesn't have the context of whether any fixtures are required or when, so can't be responsible for container startup/shutdown either.

I thought I could solve it by getting the first test worker, named gw0 to manage the container, but it wouldn't know when to shut it down again, which could either fail any still running tests, or be liable to leave it dangling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants