Skip to content

Add GitHub Actions workflow to build and push multi-arch Docker image #1186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

seuros
Copy link

@seuros seuros commented Apr 13, 2025

This will build image for both archs.

@seuros
Copy link
Author

seuros commented Apr 13, 2025

@microsoft-github-policy-service agree

@goulvenb
Copy link

Hey,
I just made a similar PR yesterday : #1184

I have a question about some choices you made on your workflow ; normally i work with Gitlab and not Github so it was the first workflow i made, i henceforth don't have much knowledge about them.

Could you please tell my why you use docker/setup-qemu-action@v3 ?
As the workflow is for building Docker images, i don't quite understand why you put a Qemu action...

As for docker/setup-buildx-action@v3, i just followed a tutorial that was not using it so it's my bad :)

Thanks !

@afourney
Copy link
Member

This looks very promising. Thank you. (Also with #1184)

Perhaps the two of you can discuss the differences, and we arrive at the best.

But, before that: I need to check some procedural stuff on my end before I can merge this (re: hosting images rather than just Dockerfiles). Let me spend a day or two on this, and I'll get back to you.

@goulvenb
Copy link

I just answered in #1184
I believe it would be better to communicate either here of in #1184, so i commented first on my PR, but in case you want to continue here here's the gist of it :


Hey @afourney,
As i just said in #1186 :

it was the first workflow i made, i henceforth don't have much knowledge about them

However, i would love to talk about it with @seuros.

From what i see, here's a table difference :

#1184 #1186
We create a container on tag creation We create a container when pushing on branch main ; from a PR or not
We do not set the permissions of the PAT ; it is implicitly set We set the permissions of the PAT to the minimum
Use the 6th version of docker/build-push-action Use the 5th version of docker/build-push-action

Additionally, @seuros added a docker/setup-qemu-action@v3 action that i do not know the use of, and a docker/setup-buildx-action@v3 action that allow to create, if i understand well, an image for the architecture "linux/amd64" and "linux/arm64"

From what i see, i believe we should implement the following :

  • #1184 : We create a container on tag creation
    • This way, we'll have access to ghcr.io/microsoft/markitdown:v1.0.0, ghcr.io/microsoft/markitdown:v2.0.0, ... Instead of just ghcr.io/microsoft/markitdown:main
  • #1186 : We set the permissions of the PAT to the minimum
    • Better security practice to set the permissions in hard instead of implicitly letting Github to set them
  • #1184 : Use the 6th version of docker/build-push-action
    • I don't see why v5 should be used instead of v6 ?
  • #1186 : Use docker/setup-buildx-action@v3
    • Did not know about this, but it's better to create an image for different CPU architecture

For the Qemu action, i don't really know until i've talked with @seuros.

@seuros
Copy link
Author

seuros commented Apr 13, 2025

Hey @goulvenb

Thanks for your message - and nice work on your PR (#1184)!

To answer your question about docker/setup-qemu-action@v3: we include it because the workflow builds Docker images for multiple architectures, namely amd64 and arm64. QEMU enables emulation for these platforms in the same environment.

As for docker/setup-buildx-action@v3, it's required to use Docker Buildx, which supports multi-platform builds and caching strategies. It use build-push-action and qemu under the hood.

Also, just to clarify: I had this workflow running for a while and only saw your PR after a notification from the bot requesting me to agree for the terms.

To collaborate better:
– I added you to my repo, so you can push updates like version upgrades and tagging directly if you'd like — this way the PR will be co-authored.
– Alternatively, feel free to continue working from your repo and cherry-pick this commit, but your PR will need to have just 2 commits. Then i will close this one.

Sorry for the delay in responding, I was on the road. Looking forward to syncing up on this!

Thanks again!

@goulvenb
Copy link

No worries @seuros !
And also, thanks for your answers to my questions !

As you added me to your repo, and as i'm still relatively new to collaboration using git, i believe it would be wiser to work from your fork.

I just made a branch dev/goulvenb with my future edits on it.
For now, it contain the recommendations i made above plus editing the README.md file to use the newly provided docker images.
I made a PR to merge it into our main branch.

If you see any additional modification we can make, feel free to make them !
Personally, apart from this PR, i believe we are ready to go.

Thanks !

@goulvenb
Copy link

I forgot to ask :

but your PR will need to have just 2 commits.

Why 2 commits ?

I'm asking more in the way of "why would afourney care about the number of commits" ?
Like i would personally prefer to have 10/20 commits to know "why" a particular line was made, but maybe there's another reason ?

@seuros
Copy link
Author

seuros commented Apr 14, 2025

When reviewing code in open source projects, maintaining clean, purposeful commits is important. Having just 2 commits makes perfect sense since we are pairing in this one.

Why clean commits matter:
Clean commits make the codebase history more navigable and useful. In open source especially, where many contributors interact with the code, commit history becomes a critical documentation tool. There are tools that search for clues commit by commit to know why something was decided. You don't want it to end up in commits where you were just tinkering.

When a maintainer sees 10 commits on a Monday morning, they expect significant feature additions - not just 2 features where a developer spent 8 commits figuring things out. This wastes review time and makes the project harder to understand.

While you might see 10 commits, the size impact on the repo is significant if there are too many iterations. If you add a YouTube video, a log file or a binary in one commit by mistake, then delete it in another, the maintainer will see a clean final change, but if merged and not squashed, everyone will have to download that file as part of the git history forever.

A good practice when starting out is to have 1 commit PER PR you open.

One trick to make it clean is to open a PR against your own repo and squash it. Later you'll get familiar with amending and rebasing from the console. ( i will squash your PR in my fork )

You'll also get less rejection from the OSS community to merge your work. (Speaking from experience)

@goulvenb
Copy link

Thank you for your insight !
As i said, i'm pretty new to collaboration using git so any intake is welcome 😊

I'm gonna search a few terms, like squashing vs merging, but now at least i know there's an alternative to merging.

Back to the topic, i'll let you ping afourney when you think it's ready to go.
Thanks again !

@goulvenb
Copy link

Well,
I think @seuros did not see my message...

However, as everything is up to date and as no uncommitted changes were made in a week (in our repo), i believe this is the best of our 2 PR, @afourney.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants