Skip to content

Can't run mutiple queues on one node with error : mpirun noticed .... exited on signal 15 (Terminated) #748

Open
@aijcode

Description

@aijcode

Please submit help issues to:
https://matsci.org/atomate

The vasp jobs run well with my SGE queue system. The single job also run well with atomate, but it will run into error with mutiple queues jobs on one node. The jobs can be submitted successfully, but would encounter a mpirun error.
the vasp.out file shows that : "mpirun noticed that process rank 3 with PID 57743 on node node3 exited on signal 15 (Terminated)"
this error never show in SGE that directly runs with "mpirun -np n vasp".

I think it would be a bug in atomate or custodian.
I just figure out that the vasp pid is submitted by the func "self.pid = _posixsubprocess.fork_exec()" in custodian.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions