Understanding Elemental's Performance

Hi,

I am trying to understand the performance of this program  at NERSC-- it is basically the same as the example in the README.md, except that I `addprocs` currently doesn't work, so I am using this (manual) approach of running the `MPIClusterManager` using `start_main_loop`, and `stop_main_loop`
```julia
N = parse(Int64, ARGS[1])

# to import MPIManager
using MPIClusterManagers

# need to also import Distributed to use addprocs()
using Distributed

# Manage MPIManager manually -- all MPI ranks do the same work
# Start MPIManager
manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)

@mpi_do manager begin
    using MPI
    comm = MPI.COMM_WORLD
    println(
            "Hello world,"
            * " I am $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))"
            * " on node $(gethostname())"
           )

    println("[rank $(MPI.Comm_rank(comm))]: Importing Elemental")
    using LinearAlgebra, Elemental
    println("[rank $(MPI.Comm_rank(comm))]: Done importing Elemental")

    println("[rank $(MPI.Comm_rank(comm))]: Solving SVD for $(N)x$(N)")
end

@mpi_do manager A = Elemental.DistMatrix(Float64);
@mpi_do manager Elemental.gaussian!(A, N, N);
@mpi_do manager @time U, s, V = svd(A); 
@mpi_do manager println(s[1])

# Manage MPIManager manually:
# Elemental needs to be finalized before shutting down MPIManager
@mpi_do manager begin
    println("[rank $(MPI.Comm_rank(comm))]: Finalizing Elemental")
    Elemental.Finalize()
    println("[rank $(MPI.Comm_rank(comm))]: Done finalizing Elemental")
end
# Shut down MPIManager
MPIClusterManagers.stop_main_loop(manager)
```
I ran some strong scaling tests on 4 Intel Haswell nodes (https://docs.nersc.gov/systems/cori/#haswell-compute-nodes) using a 4000x4000, 8000x8000, and 16000x16000 random matrix.


![chart](https://user-images.githubusercontent.com/4178946/135742001-db85a274-1aa6-4582-b690-b85c0f67dcc2.png)

I am measuring only the `svd(A)` time. I am attaching my measured times, and wanted to check if this is what you would expect. I am not an expert in how Elemental computes SVDs in a distributed fashion, and so would would be grateful for any advise you have for optimizing this benchmark's performance. In particular, I am interested in understanding what the optimal number of ranks are as a function of problem size (I am hoping that this is such an obvious questions, that you can point me to some existing documentation).

Cheers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding Elemental's Performance #72

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Understanding Elemental's Performance #72

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions