-
Notifications
You must be signed in to change notification settings - Fork 501
Support for ARM, updated PDFAlto, Docker multi-architecture build #1165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I'm getting
when trying to run on macos m1 |
@heijligers which image did you try? This one was correctly built with multi arch and arm support: My next step was to try to solve this issue with spawning process by looking on a way to have a conditional decision in the Dockerfile Any help is welcome for this issue 🙂 |
I am currently working on this for the default CRF Docker (Docker.crf) . This needs work :-
There seems a number of libraries involved :-
tinihttps://github.com/krallin/tini pdfaltohttps://github.com/kermitt2/pdfalto libwapitihttps://github.com/kermitt2/wapiti libcrfpphttps://github.com/taku910/crfpp/ libjephttps://github.com/ninia/jep I will update this comment as a tracking comment as I make progress. |
@lfoppiano It seems the way to deal with multi platform and multi architecture OS/ARCH builds is with build and buildx :-
https://github.com/docker/buildx?tab=readme-ov-file#building-multi-platform-images This should work across multiple docker images :- |
@AaronNGray yes, this is what I believe I did to build the image with github actions, however I did not succeed in having a running docker image for Apple M1. |
a2d0283
to
49e9166
Compare
Good going. I posted some stuff to check regarding ABI's and compilers on Grobid-core Discord, it maybe obvious but may need checking. |
9cb10e5
to
9a60975
Compare
9a60975
to
fb66529
Compare
770cc61
to
0ea1ce7
Compare
0ea1ce7
to
d95b92a
Compare
Add support for ARM on other OS
I tested the image
|
…docker images do not support that platform
There is no image for tensorflow/tensorflow for linux/arm64 at the moment, so the DL image needs to be ran either on Linux or through Rosetta (not sure and not guaranteed it will work) |
@heijligers I believe that the latest image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements multi-architecture Docker builds for amd64 and arm64, updates platform-specific scripts and configuration, and adjusts CI workflows to support the new targets.
- Add ARM64-specific pdfalto configs and wrapper script
- Enhance Dockerfiles with
TARGETARCH
and fix cleanup patterns - Update GitHub Actions to build and push multi-arch CRF images
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 9 comments.
Show a summary per file
File | Description |
---|---|
grobid-home/pdfalto/lin_arm-64/xpdfrc | Add language support mappings for ARM64 |
grobid-home/pdfalto/lin_arm-64/pdfalto_server | New bash wrapper to enforce timeouts, memory limits, and signals |
grobid-home/lib/lin_arm-64/TODO | List of pending ARM64 library integrations |
grobid-core/src/main/java/org/grobid/core/utilities/Utilities.java | Extend OS detection to include Linux ARM |
grobid-core/src/main/java/org/grobid/core/utilities/GrobidConfig.java | Correct typo in class documentation |
doc/Grobid-docker.md | Document ARM64 usage and required --platform flag |
Dockerfile.evaluation | Capitalize “AS runtime” |
Dockerfile.delft | Pull architecture-specific Tini binary via TARGETARCH |
Dockerfile.crf | Fix .git/ copy, refine cleanup, and use TARGETARCH for Tini |
.github/workflows/ci-build-unstable.yml | Add ARM and macOS runners to matrix, bump to Java 17 Temurin |
.github/workflows/ci-build-manual-crf.yml | Switch to Buildx and multi-arch build-push for CRF image |
5bc1e55
to
ece8953
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces multi-architecture build support for amd64 and arm64 by updating various Dockerfiles, shell scripts, and CI workflows. Key changes include:
- Updated shell scripts for improved argument handling, command verification, and timeout enforcement.
- Adjustments in Dockerfiles to incorporate TARGETARCH and multi-architecture support.
- Minor code and documentation fixes, such as enhanced OS detection and spelling corrections.
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
grobid-home/pdfalto/mac_arm-64/pdfalto_server & grobid-home/pdfalto/mac-64/pdfalto_server | Improved argument handling and added command executable checks. |
grobid-home/pdfalto/lin_arm-64/pdfalto_server & grobid-home/pdfalto/lin-64/pdfalto_server | Enhanced timeout enforcement and memory control with OS-specific considerations. |
grobid-core/src/main/java/org/grobid/core/utilities/Utilities.java grobid-core/src/main/java/org/grobid/core/utilities/GrobidConfig.java |
Refined OS detection logic and corrected documentation spelling. |
Dockerfile.evaluation, Dockerfile.delft, Dockerfile.crf | Updated Docker build configurations for multi-architecture support via TARGETARCH and consistent build practices. |
.github/workflows/ci-build-unstable.yml .github/workflows/ci-build-manual-crf.yml |
Revised CI workflows to incorporate updated JDK setups, build matrices, and multi-platform builds. |
Comments suppressed due to low confidence (3)
Dockerfile.crf:48
- The model removal pattern has been broadened from '-BidLSTM_CRF' to '-BidLSTM_'. Please verify that this change only removes the intended model files and does not accidentally delete additional valid models.
RUN rm -rf grobid-home/models/*-BidLSTM_*
Dockerfile.delft:134
- Verify that the binary naming convention using '${TARGETARCH}' matches the available Tini releases across all supported architectures.
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-${TARGETARCH} /tini
grobid-core/src/main/java/org/grobid/core/utilities/GrobidConfig.java:6
- The documentation spelling has been corrected from 'configuation' to 'configuration'.
* This class is a bean for the YAML configuration of the GROBID instance.
6777a6a
to
933f01c
Compare
Finally I nailed it!
This PR provide a multi-architecture build for amd64/arm64. I've switched to the eclipse-adoptium-17-jdk/jre but, for the moment, I'm using ubuntu 20.04 (focal) instead of ubuntu 22. For now Is limited to the manual build for the CRF only image.
This requires some tests,
here herehere the resulting image.This PR should help to solve #1089, #928, #1014, #1119
Update: At the moment the ARM support for the Deep Learning image is not working natively because there is no
linux/arm64
docker image of the base images (even oftensorflow/tensorflow:2.17.0
)