Skip to content

How to delete a manager? #2

Open
@einzigsue

Description

@einzigsue

Hi All,

I just installed MPI.jl and find I cannot shutdown the MPIManager in a secured way. Is there a way to gracefully shut down a MPIManager in MPI_ON_WORKERS mode?

When I do

julia > using MPI
julia > manager = MPIManager(np=4)
julia > workers = addprocs(manager)
julia > @parallel (+) for i in 1:4 rand(Bool) end
julia > exit()

I receive error message as following.

WARNING: Forcibly interrupting busy workers
INFO: INFO: INFO: pid=6516 id=3 op=interrupt
pid=6516 id=4 op=interrupt
pid=6516 id=5 op=interrupt
CompositeException(Any[CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)]), CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)]), CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)])])

I then tried to call rmprocs(workers,waitfor=60.0) before exit() but it returns error message in Julia v0.6.0, saying "ERROR: UndefVarError: set_worker_state not defined".

If I call MPI.Finalize() before exit() in the head, it terminated julia and returns to the error message "

*** The MPI_Finalize() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.

I then tried to call MPI.Finalize() in each worker

@everywhere using MPI
 for w in workers
         @spawnat w MPI.Finalize()
    end

before exit(), I receive the error message in julia like the following.

From worker 2: [(null):21884] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
From worker 3: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked.
From worker 3: *** This is disallowed by the MPI standard.
From worker 3: *** Your MPI job will now abort.
From worker 3: [(null):21886] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
From worker 5: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked.
From worker 5: *** This is disallowed by the MPI standard.
From worker 5: *** Your MPI job will now abort.
From worker 5: [(null):21891] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
From worker 4: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked.
From worker 4: *** This is disallowed by the MPI standard.
From worker 4: *** Your MPI job will now abort.
From worker 4: [(null):21888] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
INFO: pid=21626 id=2 op=deregister
INFO: INFO: INFO: pid=21626 id=3 op=deregister
pid=21626 id=4 op=deregister
pid=21626 id=5 op=deregister
Worker 3 terminated.
Worker 4 terminated.ERROR (unhandled task failure): EOFError: read end of file

Worker 5 terminated.ERROR (unhandled task failure): EOFError: read end of file

ERROR (unhandled task failure): EOFError: read end of file

How is the MPIManager in mode MPI_ON_WORKERS expected to be closed?

Cheers
Yue

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions