Description
Hi All,
I just installed MPI.jl and find I cannot shutdown the MPIManager in a secured way. Is there a way to gracefully shut down a MPIManager in MPI_ON_WORKERS mode?
When I do
julia > using MPI
julia > manager = MPIManager(np=4)
julia > workers = addprocs(manager)
julia > @parallel (+) for i in 1:4 rand(Bool) end
julia > exit()
I receive error message as following.
WARNING: Forcibly interrupting busy workers
INFO: INFO: INFO: pid=6516 id=3 op=interrupt
pid=6516 id=4 op=interrupt
pid=6516 id=5 op=interrupt
CompositeException(Any[CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)]), CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)]), CapturedException(AssertionError("false"), Any[(manage(::MPI.MPIManager, ::Int64, ::WorkerConfig, ::Symbol) at cman.jl:246, 1), (interrupt(::Int64) at cluster.jl:932, 1), ((::Base.Distributed.##85#86)() at task.jl:335, 1)])])
I then tried to call rmprocs(workers,waitfor=60.0) before exit() but it returns error message in Julia v0.6.0, saying "ERROR: UndefVarError: set_worker_state not defined".
If I call MPI.Finalize() before exit() in the head, it terminated julia and returns to the error message "
*** The MPI_Finalize() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
I then tried to call MPI.Finalize() in each worker
@everywhere using MPI
for w in workers
@spawnat w MPI.Finalize()
end
before exit(), I receive the error message in julia like the following.
From worker 2: [(null):21884] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
From worker 3: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked.
From worker 3: *** This is disallowed by the MPI standard.
From worker 3: *** Your MPI job will now abort.
From worker 3: [(null):21886] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
From worker 5: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked.
From worker 5: *** This is disallowed by the MPI standard.
From worker 5: *** Your MPI job will now abort.
From worker 5: [(null):21891] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
From worker 4: *** The MPI_Finalize() function was called after MPI_FINALIZE was invoked.
From worker 4: *** This is disallowed by the MPI standard.
From worker 4: *** Your MPI job will now abort.
From worker 4: [(null):21888] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
INFO: pid=21626 id=2 op=deregister
INFO: INFO: INFO: pid=21626 id=3 op=deregister
pid=21626 id=4 op=deregister
pid=21626 id=5 op=deregister
Worker 3 terminated.
Worker 4 terminated.ERROR (unhandled task failure): EOFError: read end of fileWorker 5 terminated.ERROR (unhandled task failure): EOFError: read end of file
ERROR (unhandled task failure): EOFError: read end of file
How is the MPIManager in mode MPI_ON_WORKERS expected to be closed?
Cheers
Yue