Description
On a CI/CD system where local mode is used a few times a week balena push
hangs because of the following problem with supervisor/balenaEngine:
Device state apply error Error: Failed to apply state transition steps. (HTTP code 403) unexpected - error while removing network: network <NAME> id <ID> has active endpoints Steps:["removeNetwork","removeNetwork"]
This is an instance of moby/moby#42119
It is a problem in Docker's libnetwork where its internal state gets out of sync possibly due to some racing problem or unclean exit. This leads to Docker refusing to delete the network in question.
The only workaround that worked is restarting the docker daemon. Tried different less intrusive operations, but those did not work (docker network prune --force
, docker system prune --force
, or adding a minimal container, attaching the network to it and detaching it to see whether the reference count will be cleared, etc.).
Searched extensively for other possible solutions or workarounds, but none exist yet. The real fix needs to be in libnetwork, but the moby issue is stale.