-
Notifications
You must be signed in to change notification settings - Fork 489
Multiarch images #432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Multiarch images #432
Conversation
Hi @BowlesCR , Thank you for this PR. TL;DR : This is a substantial change. I have been reviewing it for a few days, and my current view is that, while a few items need adjustment, the proposal looks practical and on a good path. Proposal: run as a parallel test workflow firstTo minimize any risk to the published images, could we run this as a separate test workflow first? Please:
Optional:
If you would prefer, let me know and I can help finish these items. If this PR reaches a good state, Checklist to promote
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the two important items I would like to request:
- Use
.github/workflows/test.yml
and keep.github/workflows/main.yml
unchanged. - Use a test Docker Hub repository for images:
DOCKERHUB_REPO: postgis/postgis
->DOCKERHUB_REPO: postgis/docker-postgis-test
.
The rest is optional, and I can handle it if you prefer.
jobs: | ||
env: | ||
DOCKERHUB_REPO: postgis/postgis | ||
GITHUB_REPO: postgis/docker-postgis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) add env : LATEST_VERSION: 17-3.5
- to define which build is tagged as latest, and implement the related tagging.
|
||
jobs: | ||
env: | ||
DOCKERHUB_REPO: postgis/postgis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to:
DOCKERHUB_REPO: postgis/docker-postgis-test
docker info -f '{{ .DriverStatus }}' | ||
- name: Load binfmt platforms for QEMU | ||
./official-images/test/run.sh -c ./official-images/test/config.sh -c test/postgis-config.sh ${{ steps.build.outputs.imageid }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) backup the log ; and in the next step : Implement a CI check to ensure the official test output contains: ( 'postgis-basics' [6/6]...passed )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with this output, but the [6/6]
bit seems fragile -- I assume that would change with the number of test cases.
In your experience would it be reasonable to regex away the counts and rely on passed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think any solution would be good that detects whether the
[postgres-basics
, postgres-initdb
, postgis-basics
] tests actually ran.
These tests are currently defined here:
https://github.com/postgis/docker-postgis/blob/master/test/postgis-config.sh
and the "postgis/postgis"
name is hard-coded:
testAlias[postgis/postgis]=postgres
imageTests[postgis/postgis]='
postgis-basics
'
If this isn't adjusted to match the image name being tested, then these tests won't run.
If I'm seeing correctly, in your earlier test log, only 3 tests ran:
Run ./official-images/test/run.sh -c ./official-images/test/config.sh -c test/postgis-config.sh sha256:90c1d3051135610d96c6578f2c379a039c5a293da0ef66aded00529d1d8b1760
testing sha256:90c1d3051135610d96c6578f2c379a039c5a293da0ef66aded00529d1d8b1760
'utc' [1/3]...passed
'no-hard-coded-passwords' [2/3]...passed
'override-cmd' [3/3]...passed
All 6 tests should run:
/home/runner/official-images/test/run.sh -c /home/runner/official-images/test/config.sh -c test/postgis-config.sh postgis/postgis:17-3.5
testing postgis/postgis:17-3.5
'utc' [1/6]...passed
'no-hard-coded-passwords' [2/6]...passed
'override-cmd' [3/6]...passed
'postgres-basics' [4/6]...passed
'postgres-initdb' [5/6]...passed
'postgis-basics' [6/6]...passed
Several solutions are possible:
1. Since it's easy to misconfigure, and DOCKERHUB_REPO
is freely configurable, we should probably modify postgis-config.sh
to provide protection for any fork case. For example, configure ["postgis/postgis", "postgis-test", and if it exists, ${DOCKERHUB_REPO} as well]. However, in this case, the first docker build step would need to tag as "postgis-test" to ensure the test runs. The log check would be just an extra test to detect any future configuration issues.
2. The other solution is much simpler. For example, the first docker build for testing could be tagged with a special name to indicate it's not for publishing, while still running the tests: "postgis/postgis:__local_testing__"
Not familiar with this output, but the [6/6] bit seems fragile -- I assume that would change with the number of test cases.
In your experience would it be reasonable to regex away the counts and rely on passed?
Currently 6 tests run, and as far as I remember, they change very rarely. If any of the 6 tests fails, the process stops with an error.
However, we should verify that both the postgres and postgis tests have run. That's why I thought it would be sufficient to test for the presence of this line with grep:
'postgis-basics' [6/6]...passed
This assumes that the postgres tests have also run successfully.
But I'm open to other suggestions as well.
Of course, ideally we should check for the presence of all 6 tests.
That is, we should verify these 6 lines:
'utc' [1/6]...passed
'no-hard-coded-passwords' [2/6]...passed
'override-cmd' [3/6]...passed
'postgres-basics' [4/6]...passed
'postgres-initdb' [5/6]...passed
'postgis-basics' [6/6]...passed
- name: Checkout source | ||
uses: actions/checkout@v4 | ||
- name: Upload digests | ||
uses: actions/upload-artifact@v4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho: The workflow must never push or publish any artifacts, manifests, or Docker images during a Pull Request test run.
so please add :
if: ${{ (github.repository == env.GITHUB_REPO) && (github.ref == 'refs/heads/master') && (github.event_name != 'pull_request') }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, this may be the gotcha I ran into before but forgot about. I'd have to come up to speed again but I remember something about attempting to adhere to the "no registery pushes during PRs" heuristic conflicting with "Need to do a non multiplatform image push to the registry first in order for the multiplatform image manifest creation to be possible" 😬 (but I may be wrong)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could certainly make PRs behave differently.
Do you care to have both architectures build on PR, or is either fine? If I recall, the current process doesn't yield anything someone can download, so either should be fine unless there's concern that something could break on only one arch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you care to have both architectures build on PR, or is either fine? If I recall, the current process doesn't yield anything someone can download, so either should be fine unless there's concern that something could break on only one arch.
I'm not sure if this answers your question, but ideally we'd want to have a PR trigger CI to build x86 image on an x86 runner and arm64 image on an arm64 runner (so that both builds happen natively from their own perspective) but the architecture of the runner that handles building the multiplatform manifest and potentially pushing the images/manifests to the registry is probably less of a concern.
- name: Upload digests | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: digests-${{ env.VERSION }}-${{ env.VARIANT }}-${{ matrix.runner-platform }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho: The workflow must be protected against parallel or consecutive runs causing artifact name collisions.
proposal add a ${{ github.run_id }}
name: digests-${{ github.run_id }}-${{ env.VERSION }}-${{ env.VARIANT }}-${{ matrix.runner-platform }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing your point, this should already be handled because the upload is implicitly scoped to the run in progress. Other runs (and even re-executing the same run) are isolated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback. You're right that actions/upload-artifact@v4 is indeed isolated, and theoretically this isn't strictly necessary.
Of course, if we want to be humorously over-cautious, we could include it with a comment noting that the -${{ github.run_id }}
suffix isn't theoretically needed due to v4's isolation, but serves as a placebo to help some maintainers sleep better at night 😄
A related thought:
Since the actions/upload-artifact@v4 overwrite:
parameter defaults to false
, in very rare cases - if a problem occurs right after upload and someone tries to re-run it - the re-run won't succeed. But I think this is still better than setting it to true
.
This exceptional case should be documented, and if such a problem occurs, a completely new job run needs to be manually triggered.
run: | | ||
docker buildx imagetools create $(jq -cr '.tags | map("-t " + .) | join(" ")' <<< "$DOCKER_METADATA_OUTPUT_JSON") \ | ||
$(printf '${{ env.DOCKERHUB_REPO }}@sha256:%s ' *) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) imho: need an <DOCKERHUB_REPO>:latest
tag ; based on the the environment value LATEST_VERSION
One slight concern I thought of: As-is all the image variants are built before any merging happens though, so unless something goes sideways in the merge step there won't be a risk of mismatched tags. |
I understand that there might be dangling images in dockerhub, but I'm assuming they have have a garbage collection process that prunes out untagged images that may exist. I know github's ghcr.io doesn't have that GC and the untagged dangling images are shown in the repo's image list which is unfortunate 😅 but the dangling images don't show up on dockerhub as far as I've ever seen so I figured it was safe to assume there's some GC policy in the background that prunes them at some point. I'm not sure it's a priority concern for the purposes of this current task of getting the multiarch images going, so we could figure out how to prune down the road if it becomes necessary. |
name: digests-${{ env.VERSION }}-${{ env.VARIANT }}-${{ matrix.runner-platform }} | ||
path: ${{ runner.temp }}/digests/* | ||
if-no-files-found: error | ||
retention-days: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retention-days:
should be 10 days instead.
The current 1 day makes detailed troubleshooting and debugging very difficult, because as volunteers, we can't always respond to issues within 24 hours.
Additionally, it can sometimes be useful to examine artifacts from the previous week's run, which is why I'd suggest 10 days (as a cautious safety measure).
( cc: @phillipross ; @BowlesCR ) I created a test repository to get familiar with the suggested approach.
What surprised me most is that GitHub Actions cannot use the For example, this does not work: merge-manifests:
name: Merge manifests and push to DockerHub
needs: make-docker-images
runs-on: ubuntu-24.04-arm # Always on arm, because why not
if: ${{ (github.repository == env.GITHUB_REPO) && (github.ref == 'refs/heads/master') && (github.event_name != 'pull_request') }} Because of this limitation, I had to decide between:
I chose the second option, which resulted in the current step 0: I have implemented most of the things I mentioned, I am not attached to my approach - feel free to simplify or rework it |
That's surprising to me too, and I've been using GHA for a while. Confirmed this against the docs: "environment-level variables are only available on the runner after the job starts executing". Notably, Might be some room to consider the pros and cons of env vs vars. Env puts it all where anyone can see it and submit PRs to change them, but requires a commit to do so, where vars are easier to change but lose the transparency and versioning. Your idea of coalescing the two together is also an interesting compromise depending on the use-case. I don't really see anything in your approach that differs from what I would do beyond style choices. If you're happy, run with it! More on the topic of PR-builds and dangling images: A pattern I've seen elsewhere is to keep a -test or -preview dockerhub repo indefinitely and make that the target for PR/nightly/adhoc builds with no worries that anyone (reasonable) is going to accidentally assume they're release-quality builds. |
As discussed briefly in #216
The high-level logic here is to build each postgres/postgis/variant tuple natively (for speed) on both AMD64 and ARM64 separately, then push those images to Dockerhub by digest (ie, untagged). Do some magic with manifests/indexes to reference those digest and push them up to Dockerhub with the human-readable tags.
When someone pulls the image, dockerd will follow the index/manifest chain to resolve the correct image for the runtime's architecture.
If this is merged, it probably obviates #256, #405, #406 (nonexhaustive)