Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

content: Adding Build Environment Threats section #1287

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

paveliak
Copy link

This PR integrates #1267 into the Build Environment draft spec

Copy link

netlify bot commented Feb 12, 2025

Deploy Preview for slsa ready!

Name Link
🔨 Latest commit d66f5bd
🔍 Latest deploy log https://app.netlify.com/sites/slsa/deploys/67b0028a84265c00085ebfcf
😎 Deploy Preview https://deploy-preview-1287--slsa.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@paveliak paveliak changed the title Adding Build Environment Threats section content: Adding Build Environment Threats section Feb 12, 2025
@paveliak
Copy link
Author

@marcelamelara Mermaid didn't work 😁 I need to convert diagram to an acceptable format. The text part is ready for review

Copy link
Contributor

@marcelamelara marcelamelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this first draft @paveliak ! I've got a few comments.

Comment on lines +82 to +84
_Mitigation_: [Control Plane] verifies build image provenance upon creating build environment. Needs [BuildEnv L1] level

_Example_: Malicious actor gained access to the build image supply chain and was ultimately able to configure the wrong image in the Build platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and the other threats , I think we should flip the Mitigation and Example so the example follows the threat description.


_Example_: Due to a bug in the build platform, the environment was used for running two or more jobs and effectively losing “ephemeral” property. Malicious actors could use this vulnerability to poison the build environments they should not have access to.

#### Continuous integrity of the build environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things: 1) "Continuous integrity" isn't really a threat, so we probably want to replace this with something like "runtime host system compromise".
2) I think we need to define the scope of "integrity of the build environment" at L3 because hardware TPMs + Linux IMA can also be used to provide continuous integrity checking, but of the file system only, whereas TEEs can extend this to the memory.


Build environment is bootstrapped from a [build image], which is expected to be an artifact of a SLSA build pipeline. Build platform verifies image provenance upon starting up the environment and provides evidence to the tenant.

Bootrapping the build environment is a complex process, especially at higher SLSA levels. [Build L3] usually requires significant changes to existing build platforms to maintain ephemeral build environments. It is not uncommon for the build platforms to rely on public cloud providers for managing compute resources that power build environments. This in turn might significantly increase attack surface because added build platform dependencies effectively become part of the [TCB].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, L55, beginning: bootstrapping.


_Mitigation_: Unique build identifier is included into the Build environment provenance (and [TPM] measurement) allowing [Control plane] to detect environment reuse. Needs [BuildEnv L2] level

_Example_: Due to a bug in the build platform, the environment was used for running two or more jobs and effectively losing “ephemeral” property. Malicious actors could use this vulnerability to poison the build environments they should not have access to.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario appears to introduce a requirement that a control plane component (presumably in a higher privileged level than the build itself) is expected to compute a build identifier and measure it into a PCR. It also seems to assume that such component will have exclusive access to the PCR, and that it can't be mutated during the life of the build environment. Presumably, the build would continue, but that value would be persisted in the build provenance, and the control plane would disavow all but the first builds.

While possible, this doesn't sound like the only (or even the most feasible) way to prevent environment reuse. For example, if the TEE provides an instance identifier, you could use that. And ultimately, this sounds like a runtime integrity problem. Because it's perfectly fine to have multiple builds reuse the same build "environment" as long as the "environment" is pristine, that is, the runtime measurements match the boot time measurements.

IOW, it's not immediately evident to me that the L2 example, as given, addresses reusability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario appears to introduce a requirement that a control plane component (presumably in a higher privileged level than the build itself) is expected to compute a build identifier and measure it into a PCR... For example, if the TEE provides an instance identifier, you could use that.

Yes, this needs to be clarified in this text. The actual BuildEnv L2 requirements, as currently defined, actually do specify that it's the environment running with a TPM/in a TEE that is supposed to measure the instance ID into the PCR (or similar), not the control plane.

While possible, this doesn't sound like the only (or even the most feasible) way to prevent environment reuse.

I think we actually need to clarify what we mean by "environment reuse." What we really mean is environment instance reuse, in line with the non-interference requirements of Build L3, which seek to ensure that the same VM instance isn't reused by another tenant as a means to leak info about or influence another build. Perhaps we even need to make this distinction clearer in the Build track.

it's perfectly fine to have multiple builds reuse the same build "environment" as long as the "environment" is pristine, that is, the runtime measurements match the boot time measurements.

This assumption is actually already built into the BuildEnv track. It's ok (and in fact, expected) to reuse the same "base" environment (i.e., base VM image, host machine). But what's maybe missing here is that we distinguish between the pre-build (running) environment and the actual "build at runtime". In the former, the CI platform has not handed off control of the build environment to the tenant, in the latter the tenant is executing their build and can essentially do whatever they want in the running VM. Because the instance ID is assigned before the tenant gets the environment, we treat it as pre-build property. Does this make sense?


_Mitigation_: Trusted execution environment provided by hardware-assisted mechanisms like [AMD SEV] and [Intel TDX] secures access to the build environment state even in the event of a fully compromised compute provider. Requires [BuildEnv L3] level.

_Example_: Malicious actor (potentially a rogue administrator) was able to retrieve encryption secrets from the build environment memory and modify contents of the root or temporary file system (used to store transient data including build artifacts).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me there are two scenarios in here.

The first one is a scenario where keys (which are critical to the supply chain integrity) are securely released to the build environment (sealing in a secure key release protocol)

The encryption in use confidentiality properties of HW-based TEEs and the ability to seal to an expected, non-spoofable register value address this, both in BuildEnv L3, prevent a rogue administrator from getting control of those keys.

The second scenario is that even if a rogue administrator manages to convince the control plane to schedule a build in a host and a hypervisor they control, in BuildEnv L3 we guarantee they can't compromise the integrity of the build environment.

The first one is a specialized case of the second one. In the first one, the goal is to e.g., overwrite a release. In the second one, the goal is to compromise the integrity of the build environment.

You could say the scenarios above are "integrity at runtime", but when I think of "runtime integrity" I'm thinking more along the lines of what @marcelamelara said above re. IMA. With TPM and HW TEEs, I have good boot measurements, but I don't have good visibility into what happens after boot.

When the build environment is minimal and it does a small number of things, this problem can typically be reasoned about with TCG logs, IMA, etc. When the environment does a lot of arbitrary things by definition (like... a build) then it gets trickier. For example, how do I know the DNS resolver hasn't changed 3 minutes into the build? Or the APT repositories? Or a process was SIGHUP and now it's using an http_proxy variable? Or that python3 is no longer executing from a file descriptor in the measured disk but from a memfd?

I'm of course interested in this discussion but I also recognize that at this point it's no longer a "malicious operator" problem, but a "your BuildEnv image behaves non-deterministically, has exploitable weakness or has been compromised before its build provenance is generated, and for all BuildEnv levels this fact is opaque"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bureado The example scenarios you provide are actually spot-on, we should add them to this section. Thank you!

I definitely agree, it's tricky to actually rely on IMA-like mechanisms to check "runtime integrity" for complex build environments (which most of them are). Even though there do exist HW TEEs that could support IMA (TDX comes to mind, there may be others), the non-determinism of the system still poses a challenge. This is the main reason why we've so far excluded runtime integrity from the BuildEnv track, and I still think we should. Now, if the build environment's FS, for example, was partitioned into read-only and RW segments or something like this, we could start considering an additional "runtime integrity" level in this track.

The other reason I think HW TEEs could be valuable for "integrity at runtime" is also their ability to detect modifications to CPU states and the TEE memory in use, in addition to their confidentiality properties. But at L3, I still think this particular property is nice-to-have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🆕 New
Development

Successfully merging this pull request may close these issues.

4 participants