Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions rfc/012-application-level-verification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# RFC 012: Application Level Verification

## Background

Contrast's `verify` subcommand is currently implemented as an RPC method served over aTLS.
This requires users to be able to establish a direct TCP connection to the Coordinator, which comes with the following drawbacks:

1. Contrast Coordinator needs to be exposed directly to the user, likely over the public internet.
- Coordinator does not benefit from protocol-level mitigations and access controls.
2. Can't be used with a TLS-intercepting box-in-the-middle.
3. Might trip systems that analyze TLS handshakes due to unconventional use of ALPN and certificate extensions.

## Requirements

1. Users must be able to verify a Coordinator's state using standard HTTP(S) APIs.
2. Users must be able to obtain the manifest history up to including the current manifest and the CA certs for the current manifest.
3. The verification must be able to uphold the same guarantees that aTLS gives.
Given an expected manifest $M_n$, successful verification implies that:
1. A Coordinator that's allowed by $M_n$ responded to the request.
2. That Coordinator responded to _this specific_ verification request.
3. The Coordinator is currently enforcing manifest $M_n$.
4. The manifest history $M_1..M_n$ returned with the verification response reflects the history seen by the Coordinator.
5. The root CA cert belongs to this Coordinator, and the mesh CA cert is the one for $M_n$.
4. Verification should not require a direct connection to the Coordinator.

## Design

We add a new HTTP server on port 1314.
This port can be exposed by a standard ingress controller, and wrapped with TLS as needed.

The server handles a single path, `/verify`, that must be called with the `POST` method and a JSON request body.
For now, the body consists of a JSON object with a single field `nonce`, which must be a 32 byte random value generated by the client.
The nonce field must be base64-encoded, so that it deserializes correctly to a Golang struct like this:

```go
type VerifyRequest struct {
Nonce []byte `json:"nonce"`
}
```

The response, if successful, is a JSON serialization of the following struct (lower-case JSON encoding hints omitted):

```go
type VerifyResponse struct {
RawAttestationDoc []byte // serialized TDX or SNP report (the same format as currently provided to validators)

// The following fields match the fields of userapi.GetManifestsResponse

Manifests [][]byte // Manifest history, the current manifest being last.
Policies map[manifest.HexString][]byte // Policies referred to by manifests.
RootCA []byte // PEM-encoded certificate
MeshCA []byte // PEM-encoded certificate
}
```

### Binding response fields to attestation

In order to satisfy requirement (3), the request and response fields need to be reflected in the attestation report.
We accomplish this by building a checksum over the fields that are not content-addressed and using that as `REPORTDATA`.

In [RFC 004](004-recovery.md#state-transitions) we defined the concept of _transitions_, which identify a history of manifest updates.
We make use of these transitions to define the checksum.

```abnf
reportdata = sha256(nonce || sha256(transition) || sha256(root-ca) || sha256(mesh-ca))
```

### Verification by the client

After fetching the `VerifyResponse` object, the client first reconstructs a transition chain from the `Manifests` field.
Since tranistions have a unique binary encoding, this is deterministic.
Together with the nonce and the CA certificates, the client can construct the expected `REPORTDATA` as described above.

Then the client creates a list of validators from the expected manifest, validates the attestation report with the expected `REPORTDATA`.
If validation passes, the client knows:

- The attestation report was created for them, due to the nonce being included in `REPORTDATA` (requirement 3.2).
- The Coordinator runs the expected software in a TEE, because it passed validation according to the manifest (requirement 3.1).

After that, the client checks that the last manifest in the response is exactly equal to the expected manifest.
Since we already know that the Coordinator is running the correct software, the client can implicitly assume that the Coordinator

- sent the manifest history up to including the currently enforced manifest (requirements 3.3, 3.4)
- the mesh CA cert matches the current manifest and the root CA cert belongs to this Contrast deployment (requirement 3.5)

The entire verification should be implemented in the `sdk` package, and should not assume a connection to the Coordinator.

```go
func ValidateState(expectedManifest []byte, nonce []byte, resp *VerifyResponse) error
```

## Alternatives considered

### Expected history instead of expected manifest

Some clients may be interested in validating the entire history instead of just the current manifest and _some_ history.
However, some clients may also not care about the history, or it may be hard to communicate the entire history out of band.
This could be a future _addition_ to the SDK, but is mostly unrelated to the proposal made here.

As a workaround for now, the history check can be implemented outside the SDK (since it's just a byte-for-byte comparison of slice items).
This check can be made even simpler if the Contrast deployment is only ever expected to have one manifest in the history (Privatemode.ai, for example).

### Other means of including the manifest history in `REPORTDATA`

We could have chosen another method, such as hashing the manifests one after another.
The argument for using the transitions is that we're already using that server-side to ensure history integrity.
Eventually, we want to surface this to users such that the history can be recovered from the output of `verify` (or `set`) alone.

### Reuse the existing HTTP server at 9102

That one serves health probes and metrics, which we usually don't want to expose outside Kubernetes.
A dedicated port is the cleaner solution.
Loading