Skip to content

Latest commit

 

History

History
207 lines (162 loc) · 10.2 KB

walkthrough.md

File metadata and controls

207 lines (162 loc) · 10.2 KB

Example

Motivating example

Consider the example of using curl through its official docker image. What threats are we exposed to in the software supply chain? (We choose curl simply because it is a popular open-source package, not to single it out.)

The first problem is figuring out the actual supply chain. This requires significant manual effort, guesswork, and blind trust. Working backwards:

  • The "latest" tag in Docker Hub points to 7.72.0.
  • It claims to have come from a Dockerfile in the curl/curl-docker GitHub repository.
  • That Dockerfile reads the following artifacts, assuming there are no further fetches during build time:
  • Each of the dependencies has its own supply chain, but let's look at curl-dev, which contains the actual "curl" source code.
  • The package, like all Alpine packages, has its build script defined in an APKBUILD in the Alpine git repo. There are several build dependencies:
    • File at URL: https://curl.haxx.se/download/curl-7.72.0.tar.xz.
      • The APKBUILD includes a sha256 hash of this file. It is not clear where that hash came from.
    • Alpine packages: openssl-dev nghttp2-dev zlib-dev brotli-dev autoconf automake groff libtool perl
  • The source tarball was presumably built from the actual upstream GitHub repository curl/curl@curl-7_72_0, by running the commands ./buildconf && ./configure && make && ./maketgz 7.72.0. That command has a set of dependencies, but those are not well documented.
  • Finally, there are the systems that actually ran the builds above. We have no indication about their software, configuration, or runtime state whatsoever.

Suppose some developer's machine is compromised. What attacks could potentially be performed unilaterally with only that developer's credentials? (None of these are confirmed.)

  • Directly upload a malicious image to Docker Hub.
  • Point the CI/CD system to build from an unofficial Dockerfile.
  • Upload a malicious Dockerfile (or other file) in the curl/curl-docker git repo.
  • Upload a malicious https://curl.haxx.se/ca/cacert.pem.
  • Upload a malicious APKBUILD in Alpine's git repo.
  • Upload a malicious curl-dev Alpine package to the Alpine repository. (Not sure if this is possible.)
  • Upload a malicious https://curl.haxx.se/download/curl-7.72.0.tar.xz. (Won't be detected by APKBUILD's hash if the upload happens before the hash is computed.)
  • Upload a malicious change to the curl/curl git repo.
  • Attack any of the systems involved in the supply chain, as in the SolarWinds attack.

SLSA intends to cover all of these threats. When all artifacts in the supply chain have a sufficient SLSA level, consumers can gain confidence that most of these attacks are mitigated, first via self-certification and eventually through automated verification.

Finally, note that all of this is just for curl's own first-party supply chain steps. The dependencies, namely the Alpine base image and packages, have their own similar threats. And they too have dependencies, which have other dependencies, and so on. Each dependency has its own SLSA level and the composition of SLSA levels describes the entire supply chain's security.

For another look at Docker supply chain security, see Who's at the Helm? For a much broader look at open source security, including these issues and many more, see Threats, Risks, and Mitigations in the Open Source Ecosystem.

Vision: Case Study

Let's consider how we might secure curlimages/curl from the motivating example using the SLSA framework.

Incrementally reaching SLSA 4

Let's start by incrementally applying the SLSA principles to the final Docker image.

SLSA 0: Initial state

slsa0

Initially the Docker image is SLSA 0. There is no provenance. It is difficult to determine who built the artifact and what sources and dependencies were used.

The diagram shows that the (mutable) locator curlimages/curl:7.72.0 points to (immutable) artifact sha256:3c3ff….

SLSA 1: Provenance

slsa1

We can reach SLSA 1 by scripting the build and generating provenance. The build script was already automated via make, so we use simple tooling to generate the provenance on every release. Provenance records the output artifact hash, the builder (in this case, our local machine), and the top-level source containing the build script.

In the updated diagram, the provenance attestation says that the artifact sha256:3c3ff… was built from curl/curl-docker@d6525….

At SLSA 1, the provenance does not protect against tampering or forging but may be useful for vulnerability management.

SLSA 2 and 3: Build service

slsa2

To reach SLSA 2 (and later SLSA 3), we must switch to a hosted build service that generates provenance for us. This updated provenance should also include dependencies on a best-effort basis. SLSA 3 additionally requires the source and build platforms to implement additional security controls, which might need to be enabled.

In the updated diagram, the provenance now lists some dependencies, such as the base image (alpine:3.11.5) and apk packages (e.g. curl-dev).

At SLSA 3, the provenance is significantly more trustworthy than before. Only highly skilled adversaries are likely able to forge it.

SLSA 4: Hermeticity and two-person review

slsa4

SLSA 4 requires two-party source control and hermetic builds. Hermeticity in particular guarantees that the dependencies are complete. Once these controls are enabled, the Docker image will be SLSA 4.

In the updated diagram, the provenance now attests to its hermeticity and includes the cacert.pem dependency, which was absent before.

At SLSA 4, we have high confidence that the provenance is complete and trustworthy and that no single person can unilaterally change the top-level source.

Full graph

full-graph

We can recursively apply the same steps above to lock down dependencies. Each non-source dependency gets its own provenance, which in turns lists more dependencies, and so on.

The final diagram shows a subset of the graph, highlighting the path to the upstream source repository (curl/curl) and the certificate file (cacert.pem).

In reality, the graph is intractably large due to the fanout of dependencies. There will need to be some way to trim the graph to focus on the most important components. While this can reasonably be done by hand, we do not yet have a solid vision for how best to do this in an scalable, generic, automated way. One idea is to use ecosystem-specific heuristics. For example, Debian packages are built and organized in a very uniform way, which may allow Debian-specific heuristics.

Composition of SLSA levels

An artifact's SLSA level is not transitive, so some aggregate measure of security risk across the whole supply chain is necessary. In other words, each node in our graph has its own, independent SLSA level. Just because an artifact's level is N does not imply anything about its dependencies' levels.

In our example, suppose that the final curlimages/curl Docker image were SLSA 4 but its curl-dev dependency were SLSA 0. Then this would imply a significant security risk: an adversary could potentially introduce malicious behavior into the final image by modifying the source code found in the curl-dev package. That said, even being able to identify that it has a SLSA 0 dependency has tremendous value because it can help focus efforts.

Formation of this aggregate risk measure is left for future work. It is perhaps too early to develop such a measure without real-world data. Once SLSA becomes more widely adopted, we expect patterns to emerge and the task to get a bit easier.

Accreditation and delegation

Accreditation and delegation will play a large role in the SLSA framework. It is not practical for every software consumer to fully vet every platform and fully walk the entire graph of every artifact. Auditors and/or accreditation bodies can verify and assert that a platform or vendor meets the SLSA requirements when configured in a certain way. Similarly, there may be some way to "trust" an artifact without analyzing its dependencies. This may be particularly valuable for closed source software.