Description
Hi,
I am trying to understand the performance of this program at NERSC-- it is basically the same as the example in the README.md, except that I addprocs
currently doesn't work, so I am using this (manual) approach of running the MPIClusterManager
using start_main_loop
, and stop_main_loop
N = parse(Int64, ARGS[1])
# to import MPIManager
using MPIClusterManagers
# need to also import Distributed to use addprocs()
using Distributed
# Manage MPIManager manually -- all MPI ranks do the same work
# Start MPIManager
manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)
@mpi_do manager begin
using MPI
comm = MPI.COMM_WORLD
println(
"Hello world,"
* " I am $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))"
* " on node $(gethostname())"
)
println("[rank $(MPI.Comm_rank(comm))]: Importing Elemental")
using LinearAlgebra, Elemental
println("[rank $(MPI.Comm_rank(comm))]: Done importing Elemental")
println("[rank $(MPI.Comm_rank(comm))]: Solving SVD for $(N)x$(N)")
end
@mpi_do manager A = Elemental.DistMatrix(Float64);
@mpi_do manager Elemental.gaussian!(A, N, N);
@mpi_do manager @time U, s, V = svd(A);
@mpi_do manager println(s[1])
# Manage MPIManager manually:
# Elemental needs to be finalized before shutting down MPIManager
@mpi_do manager begin
println("[rank $(MPI.Comm_rank(comm))]: Finalizing Elemental")
Elemental.Finalize()
println("[rank $(MPI.Comm_rank(comm))]: Done finalizing Elemental")
end
# Shut down MPIManager
MPIClusterManagers.stop_main_loop(manager)
I ran some strong scaling tests on 4 Intel Haswell nodes (https://docs.nersc.gov/systems/cori/#haswell-compute-nodes) using a 4000x4000, 8000x8000, and 16000x16000 random matrix.
I am measuring only the svd(A)
time. I am attaching my measured times, and wanted to check if this is what you would expect. I am not an expert in how Elemental computes SVDs in a distributed fashion, and so would would be grateful for any advise you have for optimizing this benchmark's performance. In particular, I am interested in understanding what the optimal number of ranks are as a function of problem size (I am hoping that this is such an obvious questions, that you can point me to some existing documentation).
Cheers!