Running a parallel GPU ClimaOcean/Oceananigans simulation on Australia's HPC - Gadi #74
taimoorsohail
started this conversation in
Show and tell
Replies: 2 comments 5 replies
-
@taimoorsohail I think the adding packages section is missing the activation of the environment in which to add the packages. Should it be julia > ] # press enter to go to package manager mode
(@v1.x) pkg> activate . # create a project in the current directory
(project-name) pkg> add CUDA, MPI, ClimaOcean, Oceananigans, Dates, CFTime, Printf # add required packages as otherwise there is this error julia> st
ERROR: UndefVarError: `st` not defined in `Main`
Suggestion: check for spelling errors or missing imports. The only reason I put all packages on one line above (I assume this still works) is that it would be easier to copy and paste! |
Beta Was this translation helpful? Give feedback.
5 replies
-
Question: what's the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This discussion thread provides step-by-step instructions on running a parallel GPU simulation of the ClimaOcean model on Australia's Gadi HPC.
GPU nodes are available on the
gpuvolta
queue on Gadi (no express/normal split currently). These instructions specify whether the command should be run on thelogin
node, or aninteractive
GPU node. The GPUs do not have internet access, so package management needs to be done on thelogin
node, for example.In the
login
node, add the following modules (can be added to.bashrc
if you want to always load at login):We then want to ensure that the MPI versions that are called are the system defaults. Navigate to the folder where you parallel simulation code will sit, and run in the command line:
julia -e 'using Pkg; Pkg.add("MPIPreferences"); using MPIPreferences; MPIPreferences.use_system_binary()' julia --project
Unfortunately,
NCDatasets.jl
tries to load the default Julia MPI, which causes issues in Gadi. To fix this, navigate to the~/.julia/artifacts/<hash>/lib
folder (this is where you specified your$JULIA_DEPOT_PATH
). Here, we will be removing thelibmpi_mpifh.so
,libmpi_mpifh.so.40
andlibmpi_mpifh.so.40.40.1
files and symbolic linking them to the default files in theopenmpi/4.1.7
GNU install.The above is a workaround; ideally
NCDatasets.jl
should allow us to specify the path of our system MPI. This may occur in the future, rendering the above workaround less relevant.Now, still in the
login
node, navigate to the folder where you parallel simulation code will sit and add the necessaryJulia
packages to run the simulation. For a simple one degree ClimaOcean simulation, use:The above should load without a hitch. Next, add the relevant lines to your
.bashrc
file to ensure that you are using a single thread, among other things.Finally, let's save a new Julia script, say,
aquaplanet.jl
in our simulation folder, that should run a simple, 1-degree aquaplanet forced by a JRA-55do RYF atmosphere.Finally, start an interactive, multi-node GPU job on gadi:
then, navigate to the folder where you want to run your simulation, and type:
This code should now run a parallel aquaplanet on Gadi! Note that this is just a testing example, and we expect it to NaN within a few time steps, but if it works then your MPI config is running!
Beta Was this translation helpful? Give feedback.
All reactions