Skip to content

OpenSees container example #29

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

OpenSees container example #29

wants to merge 9 commits into from

Conversation

tonykew
Copy link
Contributor

@tonykew tonykew commented Jul 1, 2025

Note that the links to Slurm script examples point to the "raw" file.

Unfortunately the links are specific to my repo:

https://raw.githubusercontent.com/tonykew/ccr-examples/[..]
^^^^^^^

...and will have to be changed to

https://raw.githubusercontent.com/ubccr/ccr-examples/[...]
^^^^^

Tony

Note that the links to Slurm script examples point to the "raw" file.

Unfortunately the links are specific to my repo:

  https://raw.githubusercontent.com/tonykew/ccr-examples/[..]
                                    ^^^^^^^

...and will have to be changed to

  https://raw.githubusercontent.com/ubccr/ccr-examples/[...]
                                    ^^^^^

Tony
Copy link
Contributor

@Monishnd Monishnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my review focuses on typos and minor edits. Also, for consistency, the Slurm script file extensions should be updated from .bash to .sh.

Note:- The ARM64 examples do not run as expected. They fail with an error indicating an architecture mismatch, specifically, the container image is built for amd64 and cannot run on arm64. Here are the contents of the output file:

FATAL:   While checking container encryption: could not open image /projects/academic/ccrhelpdesk/monish/OpenSees-x86_64.sif: the image's architecture (amd64) could not run on the host's (arm64)
FATAL:   first process with --sharens has already exited, could not execute process (pid 384695)
FATAL:   first process with --sharens has already exited, could not execute process (pid 384694)
FATAL:   first process with --sharens has already exited, could not execute process (pid 384697)
FATAL:   While checking container encryption: could not open image /projects/academic/ccrhelpdesk/monish/OpenSees-x86_64.sif: the image's architecture (amd64) could not run on the host's (arm64)
FATAL:   first process with --sharens has already exited, could not execute process (pid 2654130)
FATAL:   first process with --sharens has already exited, could not execute process (pid 2654129)
FATAL:   first process with --sharens has already exited, could not execute process (pid 2654127)
srun: error: cpn-v14-19: tasks 4-7: Exited with exit code 255
srun: Terminating StepId=20600625.0
srun: error: cpn-v14-17: tasks 0-3: Exited with exit code 255

The mpirun examples still need confirmation, as there's an open issue with loading modules in ccrsoft/2024.04. Update: The MPI examples ran without any issues on the vortex-future login nodes. We should add a note stating that, as of this publication, these examples are intended to be run on the vortex-future nodes. Aside from that, all other examples work as expected and produce the correct output.

@tonykew tonykew requested a review from Monishnd August 6, 2025 14:37
@tonykew
Copy link
Contributor Author

tonykew commented Aug 6, 2025

[...]
Note:- The ARM64 examples do not run as expected [...]

FATAL:   While checking container encryption: could not open image /projects/academic/ccrhelpdesk/monish/OpenSees-x86_64.sif: the image's architecture (amd64) could not run on the host's (arm64)
FATAL:   first process with --sharens has already exited, could not execute process (pid 384695)
FATAL:   first process with --sharens has already exited, could not execute process (pid 384694)
FATAL:   first process with --sharens has already exited, could not execute process (pid 384697)
FATAL:   While checking container encryption: could not open image /projects/academic/ccrhelpdesk/monish/OpenSees-x86_64.sif: the image's architecture (amd64) could not run on the host's (arm64)
FATAL:   first process with --sharens has already exited, could not execute process (pid 2654130)
FATAL:   first process with --sharens has already exited, could not execute process (pid 2654129)
FATAL:   first process with --sharens has already exited, could not execute process (pid 2654127)
srun: error: cpn-v14-19: tasks 4-7: Exited with exit code 255
srun: Terminating StepId=20600625.0
srun: error: cpn-v14-17: tasks 0-3: Exited with exit code 255

This happens if the path to the .sif file in the slurm script is not fully qualified e.g.

 OpenSees-$(arch).sif \

which should be (in your example)

 /projects/academic/ccrhelpdesk/monish/OpenSees-$(arch).sif \

The script includes a "cd" into a downloaded example so the .sif file can't be loaded.
The error message is, unfortunately, garbage! - no idea why it just doesn't say that
the .sif file is not found...???...

I've added a file to document building the .sif image for ARM64 & re-tested
the slurm scripts AOK

Copy link
Contributor

@Monishnd Monishnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ARM64 build and scripts work as expected.
Edit: We wanted to add a note for the MPI examples stating: As of this publication, these examples are intended to be run on the vortex-future nodes.

@tonykew
Copy link
Contributor Author

tonykew commented Aug 7, 2025

We wanted to add a note for the MPI examples stating: As of this publication,
these examples are intended to be run on the vortex-future nodes.

Adde a note to the README to set the ccrsoft/2024.04 software release as the
default, which (I think) is a better approach (and shouldn't need updating for
a while, I hope!)

git commit to follow...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants