Skip to content

Calico eBPF fails to init on Talos Linux #7892

@monoxane

Description

@monoxane

I'm provisioning a cluster using the Talos Linux + Kube distro and am finding that the calico-node mount-ebpffs container fails to mount the cgroup2 file system as called from calico/node/pkg/nodeinit/calico-init_linux.go.

W0731 07:12:21.604403       1 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
2023-07-31 07:12:21.607 [INFO][1] init/startup.go 432: Early log level set to info
2023-07-31 07:12:21.607 [INFO][1] init/calico-init_linux.go 57: Checking if BPF filesystem is mounted.
2023-07-31 07:12:21.607 [INFO][1] init/calico-init_linux.go 69: BPF filesystem is mounted.
2023-07-31 07:12:21.607 [INFO][1] init/calico-init_linux.go 92: Checking if cgroup2 filesystem is mounted.
2023-07-31 07:12:21.609 [INFO][1] init/calico-init_linux.go 120: Cgroup2 filesystem is not mounted. Trying to mount it...
2023-07-31 07:12:21.609 [INFO][1] init/calico-init_linux.go 126: Mount point /run/calico/cgroup is ready for mounting root cgroup2 fs.
2023-07-31 07:12:21.613 [ERROR][1] init/calico-init_linux.go 48: Failed to mount cgroup2 filesystem. error=failed to mount cgroup2 filesystem: exit status 1

Expected Behavior

Calico with eBPF dataplane works on Talos

Current Behavior

Calico with eBPF dataplane does not work on Talos due to an FS mount failure in the eBPF mount init container

Possible Solution

I am currently under the impression this is because bpfdefs.CgroupV2Path is /run/calico/cgroup which seems to be a non-writable directory under Talos (the vast majority of rootfs is readonly with the exception of specific files and the entirety of /var), but mounting an emptyDir at that location in both the init and the main pod does not provide any improvement.

I am unable to change the bpfdefs const and rebuild calico entirely due to environmental constraints (no Docker installs as required by the makefiles) but if needed I can go through the processes to get a environments set up in my work gcloud tenancy and build it that way. I am also happy to run any dev builds produced by the calico team.

Steps to Reproduce (for bugs)

Install a Talos cluster
Install Calico with the operator and this Installation CR

 apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
    name: default
spec:
    calicoNetwork:
        bgp: Enabled
        linuxDataplane: BPF
        ipPools:
        -   blockSize: 26
            cidr: 10.244.0.0/16
            disableBGPExport: false
            encapsulation: None
            natOutgoing: Enabled

Context

We need to use the eBPF dataplane for some shenanigans that doesn't work with the iptables one (mostly Source IP related), so can't just use the non-eBPF mode. Calico is the only competent CNI with BGP + eBPF support that meets our needs.

Cilium, while not helpful to us due to BGP issues, is supported on Talos and their eBPF dataplane works when installed with the following Talos guide, something in there might be helpful in working this out. https://www.talos.dev/v1.4/kubernetes-guides/network/deploying-cilium/#without-kube-proxy

Your Environment

Calico:
quay.io/tigera/operator:v1.30.4
docker.io/calico/node:v3.26.1

Other:
Talos (v1.4.6) kernel 6.1.35-talos
Containerd 1.6.21
Kubelet v1.27.3

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions