Multiarch images #432

BowlesCR · 2025-10-10T21:33:50Z

As discussed briefly in #216

The high-level logic here is to build each postgres/postgis/variant tuple natively (for speed) on both AMD64 and ARM64 separately, then push those images to Dockerhub by digest (ie, untagged). Do some magic with manifests/indexes to reference those digest and push them up to Dockerhub with the human-readable tags.

When someone pulls the image, dockerd will follow the index/manifest chain to resolve the correct image for the runtime's architecture.

If this is merged, it probably obviates #256, #405, #406 (nonexhaustive)

ImreSamu · 2025-10-15T06:52:07Z

Hi @BowlesCR ,

Thank you for this PR.
(I am one of the maintainers, replying on my own behalf for now.)

TL;DR : This is a substantial change. I have been reviewing it for a few days, and my current view is that, while a few items need adjustment, the proposal looks practical and on a good path.

Proposal: run as a parallel test workflow first

To minimize any risk to the published images, could we run this as a separate test workflow first?

Please:

Use .github/workflows/test.yml and keep .github/workflows/main.yml unchanged.
Use a test Docker Hub repository for images:
DOCKERHUB_REPO: postgis/postgis -> DOCKERHUB_REPO: postgis/docker-postgis-test.

Optional:

Add env LATEST_VERSION=17-3.5 to define which build is tagged as latest, and implement the related tagging.
Implement <DOCKERHUB_REPO> description refresh from README.md.
Implement a CI check to ensure the official test output contains: ( 'postgis-basics' [6/6]...passed )
Adjust the schedule to avoid overlap and to test daily:
- - cron: '15 5 * * 1' -> - cron: '15 13 * * *'.
For quicker feedback, you may temporarily simplify the matrix
- comment out the *-master and pg13-pg16 entries .
If possible, please re-run and share a test run (for example, from your fork) so we can review the workflow logs and inspect the built images. ( DOCKERHUB_REPO: bowlescr/docker-postgis-test ? )

If you would prefer, let me know and I can help finish these items.

If this PR reaches a good state,
I am happy to accept it as .github/workflows/test.yml and start a trial period of a few weeks.
We can begin with a minimal matrix and later expand to the full matrix.
Based on the results, we can then promote it to production.

Checklist to promote `test.yml` to production `main.yml`

<DOCKERHUB_REPO>:<tags> must be created for both amd64 and arm64, and users must be able to download and test them.
<DOCKERHUB_REPO>:latest must be tagged with amd64 and arm64 (based on the environment value LATEST_VERSION=17-3.5).
At the end of the workflow, if all steps complete successfully,
<DOCKERHUB_REPO> description must be updated with README.md
(using https://github.com/peter-evans/dockerhub-description, see Makefile logic).
(Optional: insert a short warning message at the top of the description to indicate that this is a test build.)
The CI pipeline must verify that the test log (generated by
./official-images/test/run.sh ... | tee test-output-<some-unique-id>.log)
contains the text 'postgis-basics' [6/6]...passed.
The workflow must never push or publish any artifacts, manifests, or Docker images during a Pull Request test run.
All publishing steps must be protected by conditional execution, for example:
if: ${{ (github.repository == env.GITHUB_REPO) && (github.ref == 'refs/heads/master') && (github.event_name != 'pull_request') }}
This ensures that only runs on the main repository and the master branch can publish images or artifacts.
The workflow must be protected against parallel or consecutive runs causing artifact name collisions.
To ensure unique artifact names across multiple commits or workflow re-runs, the artifact name must include the github.run_id value, for example:
name: digests-${{ github.run_id }}-${{ env.VERSION }}-${{ env.VARIANT }}-${{ matrix.runner-platform }}
(see https://docs.github.com/en/actions/reference/workflows-and-actions/contexts#github-context)

P.S.

The above reflects my own current proposal, I will likely need to coordinate with my co-maintainer before final approval.
Some of my comments might not be entirely clear; please do not hesitate to ask if you need clarification or any help.
Later, once test.yml is running reliably within this repository, I will still need to automate the matrix generation,
as it currently has to be updated manually in two places. (This is not a mandatory part for now.)
Once test.yml runs successfully here and everyone is satisfied with the DOCKERHUB_REPO: postgis/docker-postgis-test results,
it can be renamed to main.yml and configured for production use.
We may also want to add a short piece of documentation and possibly a simple local testing option later on.

Thanks again for your contribution, it is much appreciated.

ImreSamu

These are the two important items I would like to request:

Use .github/workflows/test.yml and keep .github/workflows/main.yml unchanged.
Use a test Docker Hub repository for images:
DOCKERHUB_REPO: postgis/postgis -> DOCKERHUB_REPO: postgis/docker-postgis-test.

The rest is optional, and I can handle it if you prefer.

ImreSamu · 2025-10-15T06:54:19Z

.github/workflows/main.yml

-jobs:
+env:
+  DOCKERHUB_REPO: postgis/postgis
+  GITHUB_REPO: postgis/docker-postgis


(optional) add env : LATEST_VERSION: 17-3.5

to define which build is tagged as latest, and implement the related tagging.

ImreSamu · 2025-10-15T06:54:59Z

.github/workflows/main.yml


-jobs:
+env:
+  DOCKERHUB_REPO: postgis/postgis


rename to:
DOCKERHUB_REPO: postgis/docker-postgis-test

ImreSamu · 2025-10-15T06:56:48Z

.github/workflows/main.yml

-        docker info -f '{{ .DriverStatus }}'
-
-    - name: Load binfmt platforms for QEMU
+        ./official-images/test/run.sh -c ./official-images/test/config.sh -c test/postgis-config.sh ${{ steps.build.outputs.imageid }}


(optional) backup the log ; and in the next step : Implement a CI check to ensure the official test output contains: ( 'postgis-basics' [6/6]...passed )

Not familiar with this output, but the [6/6] bit seems fragile -- I assume that would change with the number of test cases.
In your experience would it be reasonable to regex away the counts and rely on passed?

I think any solution would be good that detects whether the
[postgres-basics, postgres-initdb, postgis-basics] tests actually ran.

These tests are currently defined here:
https://github.com/postgis/docker-postgis/blob/master/test/postgis-config.sh
and the "postgis/postgis" name is hard-coded:

testAlias[postgis/postgis]=postgres imageTests[postgis/postgis]=' postgis-basics '

If this isn't adjusted to match the image name being tested, then these tests won't run.

If I'm seeing correctly, in your earlier test log, only 3 tests ran:

Run ./official-images/test/run.sh -c ./official-images/test/config.sh -c test/postgis-config.sh sha256:90c1d3051135610d96c6578f2c379a039c5a293da0ef66aded00529d1d8b1760 testing sha256:90c1d3051135610d96c6578f2c379a039c5a293da0ef66aded00529d1d8b1760 'utc' [1/3]...passed 'no-hard-coded-passwords' [2/3]...passed 'override-cmd' [3/3]...passed

All 6 tests should run:

/home/runner/official-images/test/run.sh -c /home/runner/official-images/test/config.sh -c test/postgis-config.sh postgis/postgis:17-3.5 testing postgis/postgis:17-3.5 'utc' [1/6]...passed 'no-hard-coded-passwords' [2/6]...passed 'override-cmd' [3/6]...passed 'postgres-basics' [4/6]...passed 'postgres-initdb' [5/6]...passed 'postgis-basics' [6/6]...passed

Several solutions are possible:

1. Since it's easy to misconfigure, and DOCKERHUB_REPO is freely configurable, we should probably modify postgis-config.sh to provide protection for any fork case. For example, configure ["postgis/postgis", "postgis-test", and if it exists, ${DOCKERHUB_REPO} as well]. However, in this case, the first docker build step would need to tag as "postgis-test" to ensure the test runs. The log check would be just an extra test to detect any future configuration issues.

2. The other solution is much simpler. For example, the first docker build for testing could be tagged with a special name to indicate it's not for publishing, while still running the tests: "postgis/postgis:__local_testing__"

Not familiar with this output, but the [6/6] bit seems fragile -- I assume that would change with the number of test cases.
In your experience would it be reasonable to regex away the counts and rely on passed?

Currently 6 tests run, and as far as I remember, they change very rarely. If any of the 6 tests fails, the process stops with an error.
However, we should verify that both the postgres and postgis tests have run. That's why I thought it would be sufficient to test for the presence of this line with grep:
'postgis-basics' [6/6]...passed
This assumes that the postgres tests have also run successfully.
But I'm open to other suggestions as well.

Of course, ideally we should check for the presence of all 6 tests.
That is, we should verify these 6 lines:

'utc' [1/6]...passed

'no-hard-coded-passwords' [2/6]...passed

'override-cmd' [3/6]...passed

'postgres-basics' [4/6]...passed

'postgres-initdb' [5/6]...passed

'postgis-basics' [6/6]...passed

ImreSamu · 2025-10-15T07:02:33Z

.github/workflows/main.yml

-    - name: Checkout source
-      uses: actions/checkout@v4
+    - name: Upload digests
+      uses: actions/upload-artifact@v4


imho: The workflow must never push or publish any artifacts, manifests, or Docker images during a Pull Request test run.

so please add :
if: ${{ (github.repository == env.GITHUB_REPO) && (github.ref == 'refs/heads/master') && (github.event_name != 'pull_request') }}

Hmmm, this may be the gotcha I ran into before but forgot about. I'd have to come up to speed again but I remember something about attempting to adhere to the "no registery pushes during PRs" heuristic conflicting with "Need to do a non multiplatform image push to the registry first in order for the multiplatform image manifest creation to be possible" 😬 (but I may be wrong)

Could certainly make PRs behave differently.

Do you care to have both architectures build on PR, or is either fine? If I recall, the current process doesn't yield anything someone can download, so either should be fine unless there's concern that something could break on only one arch.

Do you care to have both architectures build on PR, or is either fine? If I recall, the current process doesn't yield anything someone can download, so either should be fine unless there's concern that something could break on only one arch.

I'm not sure if this answers your question, but ideally we'd want to have a PR trigger CI to build x86 image on an x86 runner and arm64 image on an arm64 runner (so that both builds happen natively from their own perspective) but the architecture of the runner that handles building the multiplatform manifest and potentially pushing the images/manifests to the registry is probably less of a concern.

ImreSamu · 2025-10-15T07:05:43Z

.github/workflows/main.yml

+    - name: Upload digests
+      uses: actions/upload-artifact@v4
+      with:
+        name: digests-${{ env.VERSION }}-${{ env.VARIANT }}-${{ matrix.runner-platform }}


imho: The workflow must be protected against parallel or consecutive runs causing artifact name collisions.

proposal add a ${{ github.run_id }}

name: digests-${{ github.run_id }}-${{ env.VERSION }}-${{ env.VARIANT }}-${{ matrix.runner-platform }}

Unless I'm missing your point, this should already be handled because the upload is implicitly scoped to the run in progress. Other runs (and even re-executing the same run) are isolated.

Thank you for the feedback. You're right that actions/upload-artifact@v4 is indeed isolated, and theoretically this isn't strictly necessary.

Of course, if we want to be humorously over-cautious, we could include it with a comment noting that the -${{ github.run_id }} suffix isn't theoretically needed due to v4's isolation, but serves as a placebo to help some maintainers sleep better at night 😄

A related thought:

Since the actions/upload-artifact@v4 overwrite: parameter defaults to false, in very rare cases - if a problem occurs right after upload and someone tries to re-run it - the re-run won't succeed. But I think this is still better than setting it to true.

This exceptional case should be documented, and if such a problem occurs, a completely new job run needs to be manually triggered.

ImreSamu · 2025-10-15T07:08:09Z

.github/workflows/main.yml

+      run: |
+        docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \
+          $(printf '${{ env.DOCKERHUB_REPO }}@sha256:%s ' *)
+


(optional) imho: need an <DOCKERHUB_REPO>:latest tag ; based on the the environment value LATEST_VERSION

BowlesCR · 2025-10-15T22:11:19Z

One slight concern I thought of:
The image pushes are non-atomic... if the run fails there will likely be dangling digest-only images in dockerhub. Probably need some process to prune those (maybe this is built into dockerhub?)

As-is all the image variants are built before any merging happens though, so unless something goes sideways in the merge step there won't be a risk of mismatched tags.

phillipross · 2025-10-16T13:22:33Z

One slight concern I thought of: The image pushes are non-atomic... if the run fails there will likely be dangling digest-only images in dockerhub. Probably need some process to prune those (maybe this is built into dockerhub?)

As-is all the image variants are built before any merging happens though, so unless something goes sideways in the merge step there won't be a risk of mismatched tags.

I understand that there might be dangling images in dockerhub, but I'm assuming they have have a garbage collection process that prunes out untagged images that may exist. I know github's ghcr.io doesn't have that GC and the untagged dangling images are shown in the repo's image list which is unfortunate 😅 but the dangling images don't show up on dockerhub as far as I've ever seen so I figured it was safe to assume there's some GC policy in the background that prunes them at some point. I'm not sure it's a priority concern for the purposes of this current task of getting the multiarch images going, so we could figure out how to prune down the road if it becomes necessary.

ImreSamu · 2025-10-17T18:54:04Z

.github/workflows/main.yml

+        name: digests-${{ env.VERSION }}-${{ env.VARIANT }}-${{ matrix.runner-platform }}
+        path: ${{ runner.temp }}/digests/*
+        if-no-files-found: error
+        retention-days: 1


The retention-days: should be 10 days instead.

The current 1 day makes detailed troubleshooting and debugging very difficult, because as volunteers, we can't always respond to issues within 24 hours.

Additionally, it can sometimes be useful to examine artifacts from the previous week's run, which is why I'd suggest 10 days (as a cautious safety measure).

ImreSamu · 2025-10-18T14:08:32Z

( cc: @phillipross ; @BowlesCR )

I created a test repository to get familiar with the suggested approach.
Here is where I am now:

What surprised me most is that GitHub Actions cannot use the env.GITHUB_REPO variable
at the job level - it can only be used inside the job steps.

For example, this does not work:

  merge-manifests:
    name: Merge manifests and push to DockerHub
    needs: make-docker-images
    runs-on: ubuntu-24.04-arm  # Always on arm, because why not
    if: ${{ (github.repository == env.GITHUB_REPO) && (github.ref == 'refs/heads/master') && (github.event_name != 'pull_request') }}

Because of this limitation, I had to decide between:

hard-coding the value, or
finding a fork-friendly solution with the constant defined in only one place

I chose the second option, which resulted in the current step 0: constants: in the workflow ( https://github.com/ImreSamu/docker-postgis-test/blob/master/.github/workflows/test.yml )

I have implemented most of the things I mentioned,
but there may be better solutions than what I created.

I am not attached to my approach - feel free to simplify or rework it
if you have better ideas.

BowlesCR · 2025-10-18T18:35:33Z

What surprised me most is that GitHub Actions cannot use the env.GITHUB_REPO variable at the job level - it can only be used inside the job steps.

That's surprising to me too, and I've been using GHA for a while. Confirmed this against the docs: "environment-level variables are only available on the runner after the job starts executing". Notably, vars (defined in the repo config) is available at this stage though.
Your method works around it nicely -- I often have setup blocks like this to compute complicated (or reusable) logic.

Might be some room to consider the pros and cons of env vs vars. Env puts it all where anyone can see it and submit PRs to change them, but requires a commit to do so, where vars are easier to change but lose the transparency and versioning. Your idea of coalescing the two together is also an interesting compromise depending on the use-case.

I don't really see anything in your approach that differs from what I would do beyond style choices. If you're happy, run with it!

More on the topic of PR-builds and dangling images: A pattern I've seen elsewhere is to keep a -test or -preview dockerhub repo indefinitely and make that the target for PR/nightly/adhoc builds with no worries that anyone (reasonable) is going to accidentally assume they're release-quality builds.
Or even further, target all builds there and when you're ready to cut a release simply "promote" (copy) to the main repo. I like regclient / regctl (action) for this as it can properly copy all the variants of a multi-arch image, and as a bonus it has efficient handling for images with common/duplicate layers (think rsync-like). It is pretty straightforward to use, but I can also supply an example from my usage if you'd like.

BowlesCR added 2 commits October 10, 2025 17:11

Leave docker alone

aaa6374

Build and push by digest, merge into multiarch image

0ca7eae

BowlesCR mentioned this pull request Oct 10, 2025

ARM-64 build #216

Open

ImreSamu self-requested a review October 11, 2025 14:58

phillipross self-requested a review October 11, 2025 17:10

ImreSamu requested changes Oct 15, 2025

View reviewed changes

ImreSamu reviewed Oct 17, 2025

View reviewed changes

Multiarch images #432

Are you sure you want to change the base?

Multiarch images #432

Uh oh!

Conversation

BowlesCR commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ImreSamu commented Oct 15, 2025

Proposal: run as a parallel test workflow first

Checklist to promote test.yml to production main.yml

Uh oh!

ImreSamu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BowlesCR commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phillipross commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ImreSamu commented Oct 18, 2025

Uh oh!

BowlesCR commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BowlesCR commented Oct 10, 2025 •

edited

Loading

Checklist to promote `test.yml` to production `main.yml`

BowlesCR commented Oct 15, 2025 •

edited

Loading