-
Notifications
You must be signed in to change notification settings - Fork 8
OpenSees container example #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Note that the links to Slurm script examples point to the "raw" file. Unfortunately the links are specific to my repo: https://raw.githubusercontent.com/tonykew/ccr-examples/[..] ^^^^^^^ ...and will have to be changed to https://raw.githubusercontent.com/ubccr/ccr-examples/[...] ^^^^^ Tony
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of my review focuses on typos and minor edits. Also, for consistency, the Slurm script file extensions should be updated from .bash
to .sh
.
Note:- The ARM64 examples do not run as expected. They fail with an error indicating an architecture mismatch, specifically, the container image is built for amd64 and cannot run on arm64. Here are the contents of the output file:
FATAL: While checking container encryption: could not open image /projects/academic/ccrhelpdesk/monish/OpenSees-x86_64.sif: the image's architecture (amd64) could not run on the host's (arm64)
FATAL: first process with --sharens has already exited, could not execute process (pid 384695)
FATAL: first process with --sharens has already exited, could not execute process (pid 384694)
FATAL: first process with --sharens has already exited, could not execute process (pid 384697)
FATAL: While checking container encryption: could not open image /projects/academic/ccrhelpdesk/monish/OpenSees-x86_64.sif: the image's architecture (amd64) could not run on the host's (arm64)
FATAL: first process with --sharens has already exited, could not execute process (pid 2654130)
FATAL: first process with --sharens has already exited, could not execute process (pid 2654129)
FATAL: first process with --sharens has already exited, could not execute process (pid 2654127)
srun: error: cpn-v14-19: tasks 4-7: Exited with exit code 255
srun: Terminating StepId=20600625.0
srun: error: cpn-v14-17: tasks 0-3: Exited with exit code 255
The mpirun examples still need confirmation, as there's an open issue with loading modules in ccrsoft/2024.04
. Update: The MPI examples ran without any issues on the vortex-future login nodes. We should add a note stating that, as of this publication, these examples are intended to be run on the vortex-future nodes. Aside from that, all other examples work as expected and produce the correct output.
containers/2_ApplicationSpecific/OpenSees/slurm_ARM64_OpenSeesMP_example.bash
Outdated
Show resolved
Hide resolved
containers/2_ApplicationSpecific/OpenSees/slurm_ARM64_OpenSeesMP_example.bash
Outdated
Show resolved
Hide resolved
containers/2_ApplicationSpecific/OpenSees/slurm_ARM64_OpenSeesSP_example.bash
Outdated
Show resolved
Hide resolved
containers/2_ApplicationSpecific/OpenSees/slurm_ARM64_OpenSeesSP_example.bash
Outdated
Show resolved
Hide resolved
Co-authored-by: Monish Deshmukh <[email protected]>
This happens if the path to the .sif file in the slurm script is not fully qualified e.g.
which should be (in your example)
The script includes a "cd" into a downloaded example so the .sif file can't be loaded. I've added a file to document building the .sif image for ARM64 & re-tested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! ARM64 build and scripts work as expected.
Edit: We wanted to add a note for the MPI examples stating: As of this publication, these examples are intended to be run on the vortex-future nodes.
Co-authored-by: Monish Deshmukh <[email protected]>
Adde a note to the README to set the ccrsoft/2024.04 software release as the git commit to follow... |
Note that the links to Slurm script examples point to the "raw" file.
Unfortunately the links are specific to my repo:
https://raw.githubusercontent.com/tonykew/ccr-examples/[..]
^^^^^^^
...and will have to be changed to
https://raw.githubusercontent.com/ubccr/ccr-examples/[...]
^^^^^
Tony