Skip to content

Understanding Elemental's Performance #72

Open
@JBlaschke

Description

@JBlaschke

Hi,

I am trying to understand the performance of this program at NERSC-- it is basically the same as the example in the README.md, except that I addprocs currently doesn't work, so I am using this (manual) approach of running the MPIClusterManager using start_main_loop, and stop_main_loop

N = parse(Int64, ARGS[1])

# to import MPIManager
using MPIClusterManagers

# need to also import Distributed to use addprocs()
using Distributed

# Manage MPIManager manually -- all MPI ranks do the same work
# Start MPIManager
manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)

@mpi_do manager begin
    using MPI
    comm = MPI.COMM_WORLD
    println(
            "Hello world,"
            * " I am $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))"
            * " on node $(gethostname())"
           )

    println("[rank $(MPI.Comm_rank(comm))]: Importing Elemental")
    using LinearAlgebra, Elemental
    println("[rank $(MPI.Comm_rank(comm))]: Done importing Elemental")

    println("[rank $(MPI.Comm_rank(comm))]: Solving SVD for $(N)x$(N)")
end

@mpi_do manager A = Elemental.DistMatrix(Float64);
@mpi_do manager Elemental.gaussian!(A, N, N);
@mpi_do manager @time U, s, V = svd(A); 
@mpi_do manager println(s[1])

# Manage MPIManager manually:
# Elemental needs to be finalized before shutting down MPIManager
@mpi_do manager begin
    println("[rank $(MPI.Comm_rank(comm))]: Finalizing Elemental")
    Elemental.Finalize()
    println("[rank $(MPI.Comm_rank(comm))]: Done finalizing Elemental")
end
# Shut down MPIManager
MPIClusterManagers.stop_main_loop(manager)

I ran some strong scaling tests on 4 Intel Haswell nodes (https://docs.nersc.gov/systems/cori/#haswell-compute-nodes) using a 4000x4000, 8000x8000, and 16000x16000 random matrix.

chart

I am measuring only the svd(A) time. I am attaching my measured times, and wanted to check if this is what you would expect. I am not an expert in how Elemental computes SVDs in a distributed fashion, and so would would be grateful for any advise you have for optimizing this benchmark's performance. In particular, I am interested in understanding what the optimal number of ranks are as a function of problem size (I am hoping that this is such an obvious questions, that you can point me to some existing documentation).

Cheers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions