Skip to content

zombie clusters #914

Open
Open
@chiaral

Description

@chiaral

I have noticed for a while now, that when my computations don't work (i.e.a worker die) often times regardless of my biggest effort to kill a cluster, i keep seeing zombie clusters in my machine.

i.e.:

  1. I do cluster.close()
  2. I restart the kernel
  3. I run in another notebook:
from dask_gateway import Gateway
g = Gateway()
g.list_clusters()

and the cluster is still there.
Usually I try to scale it down to 0, so at least I am not using anything, but the cluster stays there.

Today I kept having issues with my clusters - probably due to what I want to do - and I had 4 zombie clusters (that I managed to scale down to 0 by connecting to each of them through cluster = g.connect(g.list_clusters()[i].name) ), so I decided to restart my server entirely.

I went on my home page, pressed stop server, and restarted it.
And on the new server I could still list the zombie clusters with
g.list_clusters()
They all have 0 workers and cores, but they are there, and I think they still can take memory just by existing there.

After a while - i guess after whatever timeout limit is in place- they disappear.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions