Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: bpf program failes due of linux LSM Lockdown #8535

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

chifu1234
Copy link

Pull Request

What? (description)

When secureboot images are used, talos by default sets lockdown to confidentiality (lockdown=confidentiality).
This will prevent bfp based programs to run properly. BFP is used in many commonly used Kubernetes CNI like Cilium.
Many other commonly used Linux distros are moving away from confidentiality mode and move to integrity mode.

More Discussion here iovisor/bcc#2565.

Why? (reasoning)

Commonly used Kubernetes CNI not working properly.

Acceptance

Please use the following checklist:

  • you linked an issue (if applicable)
  • you included tests (if applicable)
  • you ran conformance (make conformance)
  • you formatted your code (make fmt)
  • you linted your code (make lint)
  • you generated documentation (make docs)
  • you ran unit-tests (make unit-tests)

See make help for a description of the available targets.

I still have to test the change in my environment. Not sure if there is also change needed on the factory code?

Commonly used Kubernetes CNI using multiple feature
are blocked by this parameter

Signed-off-by: Kevin Klopfenstein <[email protected]>
@frezbo
Copy link
Member

frezbo commented Apr 3, 2024

There is no need to change the defaults, when generating an image -lockdown lockdown=integrity kernel args can be set to override the defaults. It's documented here https://www.talos.dev/v1.6/talos-guides/install/boot-assets/#imager and this is how existing users change lockdown mode

@chifu1234
Copy link
Author

@frezbo I'm not sure if you want your users in order to have a working Kubernetes Cluster using Cilium CNI and Secureboot to build and manage their own images. At least, I think the documentation should get an update on that.
What do you think to edit to code to let the user overwrite kernel cmd via customization.extraKernelArgs?
Currently they are prepended not overwritten:
Command line: talos.platform=metal console=ttyS0 console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 lockdown=confidentiality lockdown=integrity

@frezbo
Copy link
Member

frezbo commented Apr 3, 2024

@frezbo I'm not sure if you want your users in order to have a working Kubernetes Cluster using Cilium CNI and Secureboot to build and manage their own images. At least, I think the documentation should get an update on that. What do you think to edit to code to let the user overwrite kernel cmd via customization.extraKernelArgs? Currently they are prepended not overwritten: Command line: talos.platform=metal console=ttyS0 console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 lockdown=confidentiality lockdown=integrity

I meant to add it to factory, or when using imager. If using customization.extraKernelArgs it needs an upgrade to take effect. There is no need to build your own if using Image Factory

Not every user uses Cilium with SecureBoot and making Talos less secure for one CNI is not ideal

@chifu1234
Copy link
Author

Currently, when I create an Installer Image with extraKernelArgs, it prepends the Args as expected. This results in Linux using the confidentiality Args. It may be because it's the first or most secure Args.

@frezbo
Copy link
Member

frezbo commented Apr 3, 2024

Currently, when I create an Installer Image with extraKernelArgs, it prepends the Args as expected. This results in Linux using the confidentiality Args. It may be because it's the first or most secure Args.

Could you post the steps used? If it's a bug we can fix it

@chifu1234
Copy link
Author

sure im using the following URL https://factory.talos.dev/?version=1.6.7&ext-siderolabs%2Fbnx2-bnx2x=&ext-siderolabs%2Fintel-ucode=&ext-siderolabs%2Fiscsi-tools=&extra-args=lockdown%3Dintegrity.
Then Upgrade my cluster with talosctl --talosconfig ./talosconfig upgrade --nodes x.x.x.x --image factory.talos.dev/installer-secureboot/9dc6a04fb9666f1efadef7b941ec8bc4235ec76e12debf7fe2c739569de255d5:v1.6.7 --preserve
$ talosctl --talosconfig ./talosconfig dmesg | less

x.x.x.x: kern:  notice: [2024-04-03T09:23:16.304993454Z]: Linux version 6.1.82-talos (@buildkitsandbox) (gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41) #1 SMP Tue Mar 19 17:48:01 UTC 2024
x.x.x.x: kern:    info: [2024-04-03T09:23:16.304993454Z]: Command line: talos.platform=metal console=ttyS0 console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 lockdown=confidentiality lockdown=integrity

@frezbo
Copy link
Member

frezbo commented Apr 3, 2024

sure im using the following URL https://factory.talos.dev/?version=1.6.7&ext-siderolabs%2Fbnx2-bnx2x=&ext-siderolabs%2Fintel-ucode=&ext-siderolabs%2Fiscsi-tools=&extra-args=lockdown%3Dintegrity. Then Upgrade my cluster with talosctl --talosconfig ./talosconfig upgrade --nodes x.x.x.x --image factory.talos.dev/installer-secureboot/9dc6a04fb9666f1efadef7b941ec8bc4235ec76e12debf7fe2c739569de255d5:v1.6.7 --preserve $ talosctl --talosconfig ./talosconfig dmesg | less

x.x.x.x: kern:  notice: [2024-04-03T09:23:16.304993454Z]: Linux version 6.1.82-talos (@buildkitsandbox) (gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.41) #1 SMP Tue Mar 19 17:48:01 UTC 2024
x.x.x.x: kern:    info: [2024-04-03T09:23:16.304993454Z]: Command line: talos.platform=metal console=ttyS0 console=tty0 init_on_alloc=1 slab_nomerge pti=on consoleblank=0 nvme_core.io_timeout=4294967295 printk.devkmsg=on ima_template=ima-ng ima_appraise=fix ima_hash=sha512 lockdown=confidentiality lockdown=integrity

Could you try this schematic id: 7486cdb58bff0b7bf65f6f62d376542dff4c9bd97acb39ef59a76fcfacd76776

Seems it was missing -lockdown in the link you posted

@frezbo
Copy link
Member

frezbo commented Apr 3, 2024

This is the input:

customization:
    extraKernelArgs:
        - -lockdown
        - lockdown=integrity
    systemExtensions:
        officialExtensions:
            - siderolabs/bnx2-bnx2x
            - siderolabs/intel-ucode
            - siderolabs/iscsi-tools

@smira
Copy link
Member

smira commented Apr 3, 2024

I don't think there's a nice way to change lockdown flag via image factory, as installer will always try to write default, so you need to use machine config for that. We might consider changing the default for 1.8.

@chifu1234
Copy link
Author

@smira & @frezbo I was able to fix this issue by setting the args in the image factory

customization:
    extraKernelArgs:
        - -lockdown
        - lockdown=integrity

but not via machine config

machine:
      install:
        extraKernelArgs:
            - -lockdown
            - lockdown=integrity

OK then I will update my PR to update the documentation on Cilium CNI (https://www.talos.dev/v1.6/kubernetes-guides/network/deploying-cilium/). ?

@nberlee
Copy link
Contributor

nberlee commented Apr 5, 2024

Just to share our experience, we've been operating Talos with secureboot and utilizing Cilium for Kubernetes networking, enabling a wide range of BPF functionalities, for quite some time now—months, in fact—without encountering any significant issues tied to BPF's lockdown mode set to confidentiality.

Here are the specifics of our KubeProxyReplacement configuration:

KubeProxyReplacement Details:
  Status:                 True
  Socket LB:              Enabled
  Socket LB Tracing:      Enabled
  Socket LB Coverage:     Hostns-only
  Devices:                bond0 <snip> (Direct Routing)
  Mode:                   DSR
  Backend Selection:      Maglev (Table Size: 16381)
  Session Affinity:       Enabled
  Graceful Termination:   Enabled
  NAT46/64 Support:       Disabled
  XDP Acceleration:       Native
  Services:
  - ClusterIP:      Enabled
  - NodePort:       Enabled (Range: 30000-32767)
  - LoadBalancer:   Enabled
  - externalIPs:    Enabled
  - HostPort:       Enabled

The only minor issue we've noted involves occasional log messages upon Cilium startup, indicating:

lockdown_is_locked_down: 55 callbacks suppressed

and:

Lockdown: bpftool: use of bpf to read kernel RAM is restricted; see man kernel_lockdown.7

However, from our perspective, these don't present a significant concern.

Copy link

github-actions bot commented Jun 5, 2024

This PR is stale because it has been open 45 days with no activity.

@github-actions github-actions bot added the Stale label Jun 5, 2024
@maxpain
Copy link
Contributor

maxpain commented Sep 24, 2024

Just to share our experience, we've been operating Talos with secureboot and utilizing Cilium for Kubernetes networking, enabling a wide range of BPF functionalities, for quite some time now—months, in fact—without encountering any significant issues tied to BPF's lockdown mode set to confidentiality.

Any problems so far?

@maxpain
Copy link
Contributor

maxpain commented Oct 15, 2024

Just to share our experience, we've been operating Talos with secureboot and utilizing Cilium for Kubernetes networking, enabling a wide range of BPF functionalities, for quite some time now—months, in fact—without encountering any significant issues tied to BPF's lockdown mode set to confidentiality.

Maybe Cilium fallbacks to less optimized calls? Did you check it?
It could affect the performance.
I don't think ignoring Cilium syscall errors is a good idea.

@maxpain
Copy link
Contributor

maxpain commented Oct 15, 2024

@smira @frezbo what do you think?

@smira
Copy link
Member

smira commented Oct 15, 2024

@smira @frezbo what do you think?

we don't have any reports of that, you should probably ask your Cilium support representative about operating in a lockdown mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants