-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description of bug
I’m running buildkitd in rootless mode on a GKE cluster. Most of the time everything works fine, but occasionally (after ~1–2 weeks of uptime), builds start failing with the following error:
runc run failed: unable to start container process: error during container init:
unable to join session keyring: unable to create session key: disk quota exceeded
Restarting the buildkitd container resolves the problem temporarily, but it eventually reappears.
Deployment setup:
I deploy buildkitd as a service in Kubernetes with the following configuration:
containers:
- name: buildkitd
image: moby/buildkit:master-rootless
args:
- --addr
- unix:///run/user/1000/buildkit/buildkitd.sock
- --addr
- tcp://0.0.0.0:1234
- --oci-worker-no-process-sandbox
- --oci-worker-gc
- --oci-worker-rootless
- --oci-worker-gc-keepstorage=2000,15000,70000
- --oci-max-parallelism=10
env:
- name: "BUILDKIT_SESSION_KEYRING"
value: "0"
securityContext:
seccompProfile:
type: Unconfined
appArmorProfile:
type: Unconfined
runAsUser: 1000
runAsGroup: 1000
The GitLab runner (also in Kubernetes) runs build jobs using this container image:
moby-buildkit:v0.23.2-v0.20.3-28.3.2
Job args:
containerize:
image: moby-buildkit:v0.23.2-v0.20.3-28.3.2
variables:
BUILDKITD_FLAGS: --oci-worker-no-process-sandbox
BUILDKIT_HOST: tcp://buildkitd.gitlab-runner.svc.cluster.local:1234
BUILD_ARGS: ""
CACHE_ARGS: "--export-cache type=registry,ref=${CI_REGISTRY_IMAGE}/${IMAGE_NAME}:cache-${CI_COMMIT_REF_SLUG},mode=max,image-manifest=true --import-cache type=registry,ref=${CI_REGISTRY_IMAGE}/${IMAGE_NAME}:cache-${CI_COMMIT_REF_SLUG}"
script: ...
Setting BUILDKIT_SESSION_KEYRING=0 didn’t prevent the issue.
Restarting the container temporarily fixes it, which suggests a resource leak.
Node metrics show that disk usage is not full and CPU/memory quotas are not exceeded when the error occurs.
Could this be related to session keyring exhaustion in rootless mode?