/sys/devices/virtual/powercap accessible by default to containers
Intel's RAPL (Running Average Power Limit) feature, introduced by the Sandy Bridge microarchitecture, provides software insights into hardware energy consumption. To facilitate this, Intel introduced the powercap framework in Linux kernel 3.13, which reads values via relevant MSRs (model specific registers) and provides unprivileged userspace access via sysfs
. As RAPL is an interface to access a hardware feature, it is only available when running on bare metal with the module compiled into the kernel.
By 2019, it was realized that in some cases unprivileged access to RAPL readings could be exploited as a power-based side-channel against security features including AES-NI (potentially inside a SGX enclave) and KASLR (kernel address space layout randomization). Also known as the PLATYPUS attack, Intel assigned CVE-2020-8694 and CVE-2020-8695, and AMD assigned CVE-2020-12912.
Several mitigations were applied; Intel reduced the sampling resolution via a microcode update, and the Linux kernel prevents access by non-root users since 5.10. However, this kernel-based mitigation does not apply to many container-based scenarios:
- Unless using user namespaces, root inside a container has the same level of privilege as root outside the container, but with a slightly more narrow view of the system
sysfs
is mounted inside containers read-only; however only read access is needed to carry out this attack on an unpatched CPU
While this is not a direct vulnerability in container runtimes, defense in depth and safe defaults are valuable and preferred, especially as this poses a risk to multi-tenant container environments. This is provided by masking /sys/devices/virtual/powercap
in the default mount configuration, and adding an additional set of rules to deny it in the default AppArmor profile.
While sysfs
is not the only way to read from the RAPL subsystem, other ways of accessing it require additional capabilities such as CAP_SYS_RAWIO
which is not available to containers by default, or perf
paranoia level less than 1, which is a non-default kernel tunable.
References
References
/sys/devices/virtual/powercap accessible by default to containers
Intel's RAPL (Running Average Power Limit) feature, introduced by the Sandy Bridge microarchitecture, provides software insights into hardware energy consumption. To facilitate this, Intel introduced the powercap framework in Linux kernel 3.13, which reads values via relevant MSRs (model specific registers) and provides unprivileged userspace access via
sysfs
. As RAPL is an interface to access a hardware feature, it is only available when running on bare metal with the module compiled into the kernel.By 2019, it was realized that in some cases unprivileged access to RAPL readings could be exploited as a power-based side-channel against security features including AES-NI (potentially inside a SGX enclave) and KASLR (kernel address space layout randomization). Also known as the PLATYPUS attack, Intel assigned CVE-2020-8694 and CVE-2020-8695, and AMD assigned CVE-2020-12912.
Several mitigations were applied; Intel reduced the sampling resolution via a microcode update, and the Linux kernel prevents access by non-root users since 5.10. However, this kernel-based mitigation does not apply to many container-based scenarios:
sysfs
is mounted inside containers read-only; however only read access is needed to carry out this attack on an unpatched CPUWhile this is not a direct vulnerability in container runtimes, defense in depth and safe defaults are valuable and preferred, especially as this poses a risk to multi-tenant container environments. This is provided by masking
/sys/devices/virtual/powercap
in the default mount configuration, and adding an additional set of rules to deny it in the default AppArmor profile.While
sysfs
is not the only way to read from the RAPL subsystem, other ways of accessing it require additional capabilities such asCAP_SYS_RAWIO
which is not available to containers by default, orperf
paranoia level less than 1, which is a non-default kernel tunable.References
References