Skip to content

P3 lookup text file is being read by all MPI ranks -- can cause issue with filesystems #6654

Open
@ndkeen

Description

@ndkeen

For cases that use P3 (note I assumed there were such cases in E3SM, but I'm not currently finding any in the set I've been testing...), we are reading a small text file in a poor parallel method (by letting each MPI rank read the same file).
I was surprised to find we are doing this and surely it was a mistake as this is never a good idea.
While file is small, it still causes issues with the filesystems and NERSC admins are noticing.
It could also cause a slowdown (or even stall/hang).

I have been testing a quick fix to have rank 0 read the file and broadcast data to others, which seems to be BFB, but will need work to properly implement.

NERSC has even suggesting we move our inputdata from CFS (which uses DVS) to scratch (Lustre).
They have said we can have scratch space that is not purged for this purpose.
In general, I've been testing performance of reading from scratch and it seems about the same, but if sole reason for moving is to avoid complications such as this, hopefully we can just fix.

I also made an issue in scream (will link) as same problem exists there, but implementation of fix may be slightly different.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EAMxxPRs focused on capabilities for EAMxxMachine Fileseamhelp wantedinput fileinputdataChanges affecting inputdata collection on bluespm-cpuPerlmutter at NERSC (CPU-only nodes)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions