Skip to content

What are Veracruz computations?

Dominic Mulligan edited this page Nov 9, 2020 · 5 revisions

At a high-level of abstraction, every Veracruz computation involves some number of data-owners trying to collaborate with a program owner. Call the data sets of the respective data owners Di for 1 <= i <= N, and the program of the program owner, P. Collectively, the group wishes to compute the value P(D1, ..., DN), perhaps making use of a third party, or delegate's, machine to power the computation.

Importantly, the data and program owners do not want to reveal anything about their respective data sets, or program, unless they explicitly choose to declassify (i.e. reveal to a third-party) that information themselves, or agree to a potential declassification when entering a collaborative Veracruz computation.

Moreover, the delegate does not want their machine to be damaged by a program that they cannot see or monitor. The trusted Veracruz runtime therefore provides strong sandboxing guarantees to the delegate, ensuring a level of protection no matter what program, P, is running on their hardware.

Is Veracruz always privacy-preserving?

Absolutely not! Veracruz will always provide a base-level of security, for example by protecting against snooping by the delegate through the use of strong containerization, and the use of TLS-encrypted links also stops third parties from intercepting traffic between the program and data owners with the delegate's container, but these fall far short of guaranteeing a privacy-preserving computation per se.

What's missing is a consideration of where the results of a Veracruz computation are able to flow to, and what these results might say about the secret inputs to the computation.

To plug this gap, every Veracruz computation is configurable with a global policy that describes the identities of the participants to the computation, and their roles, amongst other things. Participants to a computation are expected to vet this global policy before collaborating in a computation, and understand the consequences of collaborating.

In certain circumstances, participants may find it useful to intentionally declassify information about their respective secrets out-of-band — that is, externally to the Veracruz computation, for example before the computation starts — before everybody enrolls in a computation, as a way of furthering a collaboration. Some situations where this makes sense will be discussed in What are some use cases for Veracruz? This may seem like a strange choice, but it is intentional: Veracruz is capable of operating around existing trust relationships, and likewise protecting against the lack of a trust relationship, as the situation requires. Not every collaborative computation requires heavy-weight privacy protections, with all the inconvenience (and often, inefficiencies) that this entails. Yet, some do. Veracruz allows you to pick and choose, as the situation demands.

This aspect of Veracruz is important to understand, so we'll reiterate: Veracruz as a platform is not necessarily privacy-preserving. Rather, individual computations built on top of Veracruz may be privacy-preserving, whilst others may not. Computations have to be reviewed on a case-by-case basis, by the principals party to those computations, taking into account existing trust relationships, to understand what protections are being provided.

The next few subsections walk through the various steps that a group of collaborators follow, when working with Veracruz.

What is in the global policy?

Henceforth, we'll call the collaborators in a Veracruz computation principals. We need a global policy to describe the roles and identities of every principal, to ensure everybody is aware of who will supply inputs, who will receive outputs, and the identities of all the other principals in the computations.

One role of the global policy is describing the "topology" of a Veracruz computation, describing who data will flow from, who will process it, and who will receive an output. It's therefore important that everybody understands the global policy, and has a chance to vet it before collaborating, as it plays an outsized role in a given principal understanding whether a Veracruz computation is sufficiently "safe" for them to take part in. As a result, we assume that the global policy is public, and vettable by each participant before the computation starts.

The global policy assigns to each principal party to a prospective computation a mixture of roles, as mentioned previously:

  • Data Provider: provides an input data set to the computation. Whilst Veracruz supports computations with no Data Providers, in a typical Veracruz computation there may be N separate principals with this role, each providing a private input to the computation, Di for 1 <= i <= N. Moreover, one principal may provide multiple input data sets to the computation (i.e. effectively, one principal has this role twice).
  • Program Provider: contributes an input program P. In the current design of Veracruz, there is exactly one principal with this role.
  • Computation Host, or Delegate: provides the computational power necessary to compute the result of the computation, P(D1, ..., DN). In the current version of Veracruz there is exactly one principal with this role.
  • Result Receiver: may retrieve the result of the computation, as computed by the Delegate, once the computation has finished. Like the case with the Data Provider role, in a Veracruz computation there may be many principals with this particular role.

Veracruz supports principals taking on more than one role. For example, a particular Veracruz computation may be configured so that a principal takes on the role of Program Provider and Data Provider, whilst another computation may be configured so that all of the principals with the Data Provider role also have the Result Receiver role, too.

How do you vet the global policy?

As mentioned above, Veracruz can support privacy-preserving collaborative computations. As a result, each principal in a Veracruz computation has a set of concerns when enrolling in a computation that is a function of the role, or combined set of roles, in the computation that they take on:

  • Unless they actively choose to declassify their data sets, Di, each principal with the Data Provider role wishes to keep their data sets secret from the other principals in the computation.
  • Unless they actively choose to declassify their program, P, the principal with the Program Provider role wishes to keep their program a secret from the other principals in the computation.
  • The Delegate wishes to prevent their machine from being damaged, subject to malware attack, or similar, by the program running on their machine. Note that this desire is in tension with the Data and Program Providers' desire to keep their inputs to the computation a secret: this desire implies that the Delegate cannot audit the program before, or monitor it during, the execution of the program on the input data sets, P(D1, ..., DN).
  • All principals party to the computation wish to ensure that the result of the computation is only revealed to principals with the Result Receiver role.

In short, in the most general case, principals party to a Veracruz computation are all mutually distrusting, and further each individual principal may assume that an arbitrary collection of other principals party to the computation are conspiring against them to try to steal their secrets, or damage their machine, depending on the principal. For example, a principal with the Data Provider role can assume that the Delegate and Program Provider are working in cahoots, trying to subvert the Veracruz computation and steal the input data set.

The above is a description of the most general case: in various use-cases for Veracruz, we can imagine there being existing trust relationships between various principals and therefore this level of paranoia may be unwarranted, or even inconvenient. In other cases, there really is mutual distruct amongst all parties, in which case the data owners may demand to see the program that will be applied to their input data sets, for additional vetting, before enrolling in a computation. Again, some of these will be discussed in What are some use cases for Veracruz?

Principals in a prospective Veracruz computation therefore need to carefully read the global policy file before committing to enrolling in a computation. In particular, they must properly understand who the results of a computation are shared with, and the consequences of this sharing needs to be evaluated with respect to any existing trust relationships that they may have with the Results Receiver. If they do not trust the Results Receiver then they may need to seek additional assurances from other principals in the computation, as a result. A data provider may demand access to the source code of the program, to be run on their data set, to make sure that the program does not declassify secrets as a side-effect of the computation.

We will discuss the Veracruz threat model in more detail, in What is the Veracruz threat model?

The policy is acceptable. Now what?

As a first step, the Program Provider uses a remote attestation protocol to ensure that the the delegate has loaded the trusted Veracruz runtime correctly. The delegate's machine is assumed to be capable of spawning some instance of a strong containerization technology, which Veracruz uses as a "venue" within which a computation takes place. At the time of writing this document, Veracruz supports three different containerisation mechanisms, covering different points on a continuum of paranoia. These are:

  • Arm TrustZone Trusted Applications, deployed under the open-source OPTEE trusted execution environment.
  • Intel's Software Guard Extensions (SGX) Secure Enclaves.
  • Hypervisor-based containerisation, using seL4, Data61's high-assurance, formally-verified capability-based microkernel, running in the EL2 exception level (i.e. a privileged mode within which a hypervisor executes) on Arm AArch64.

The trusted Veracruz runtime's job is to receive, load, manage, and provide services to the program owner's program, P, as it executes on the data sets provided by the data set owners. As a result, this trusted runtime also needs to be vetted, or implicitly trusted, by everybody party to the computation, as it again plays an outsized role in the security of any collaborative computation. We therefore assume that the code of the trusted Veracruz runtime is publicly available, and auditable by everybody prior to enrolling in a computation. The trusted Veracruz runtime is not a secret.

The remote attestation protocol, mentioned above, therefore serves to convince the program owner that their audit of the trusted Veracruz runtime took place on the same code that is now loaded into the delegate's container, and that no backdoor has been surreptitiously inserted into the runtime by the delegate. Moreover, the attestation protocol will also convince the program owner that the trusted Veracruz runtime is executing within a genuine container, e.g. a genuine Intel SGX enclave, and not being emulate, in some way by a clever but malicious delegate.

Once the program owner has verified the authenticity of the runtime, using remote attestation, they provision their program into the container on the delegate's machine. To do this, they use an encrypted TLS link that terminates directly inside the container. The delegate — untrusted, in the general case, by the program owner — simply sees opaque, encrypted bytes being transferred into the container, and cannot interpret these without breaking TLS encryption.

Is that use of remote attestation really sufficient to trust the delegate's container?

No, the above description of Veracruz's remote attestation protocol was simplified quite substantially. When we designed the various protocols used in Veracruz we needed to be mindful of two problems:

  1. Identity: principals should never be confused about which container they are communicating with. We therefore need some means of identifying containers.
  2. Freshness: a malicious container should not be able to "recycle" attestation tokens from previous rounds, fooling a principal into thinking that they are unmodified Veracruz runtimes, executing inside an authentic container, when they are not.

The second problem is easiest to solve, and requires that principals generate fresh random numbers, or nonces, and bundle these with any communication with the container. These nonces are sent back to the principal in any response that the container makes. A principal receiving a message, with an unrecognized nonce, knows something has either gone awry or a malicious container is trying to recycle messages from a previous round of communication with another principal. In either case, the computation should be aborted. Note that this use of nonces is standard in cryptographic engineering for ensuring freshness.

For identity, we have a more significant problem: the usual mechanism for identifying an agent does not work, or at the very least is not very convenient when working with containers!

The issue with identity stems from the fact that Veracruz containers are ephemeral in the sense that they can come and go rapidly. Usually, to identify an agent, online say, we'd rely on some wider infrastructure based on cryptographic certificates issued by a certificate authority. This wider infrastructure only really makes sense when the certificates issued by the authority refer to the public keys of long-lasting entities, like a server hosting a website, for instance.

This is a real problem, as without some reliable way of identifying a container a principal cannot really be sure that they are establishing a TLS connection with the same container that they have authenticated using remote attestation. To solve this problem, Veracruz containers self-generate an asymmetric key-pair and then self-sign a certificate using the private key of this key-pair. The public key and self-signed certificate are then "published", or made available to every principal in the prospective computation. The hash of this self-signed certificate is included in any attestation token issued by the container. Now, attestation tokens are explicitly tied to a particular container, with a known identity and known public key, which can be used by principals establishing a TLS connection to ensure that they are communicating with the right container.

Can't you just look inside the container?

This depends on the particular container, and the method of looking! For Arm's TrustZone and Intel's SGX, the answer in general is "no", at least not without using sophisticated side-channel attacks against the container, or subverting the container's security guarantees, somehow. At present, we defend against neither of these — see What is the Veracruz threat model? for more details on the precise Veracruz threat model.

One of our supported containerization technologies, Intel SGX, provides additional protection against a range of physical attacks against the container, which architecturally Arm's TrustZone does not do, though implementations may choose to do so. Hypervisors, such as seL4, are also incapable of defending against physical attacks without additional hardware support.

Moreover, strong containerization technology usually provides an integrity guarantee, in addition to privacy guarantees, which means that third parties cannot influence the course of the computation inside the container, from outside. In many Veracruz use-cases, this integrity guarantee is as interesting to users as the guarantee of privacy as it means that, via remote attestation, they're able to know precisely what program is loaded into a container, and know that the delegate cannot influence the behaviour of that code when it executes.

Principals are expected to evaluate the container technology in use by the delegate before enrolling in a computation. For some collaborative computations, especially where an existing trust relationship exists, hypervisor-based approaches may be perfectly adequate. In other circumstances, the additional protections offered by hardware-based approaches, like SGX and TrustZone, may be necessary.

Why does the delegate trust P?

They don't, as they never see it, and cannot monitor its runtime behaviour due to the strong containerization technology in use on their machine!

However, they do trust the Veracruz runtime, which is auditable by them. A key property of the trusted Veracruz runtime is that it correctly sandboxes the program, P, which is a WebAssembly (or WASM, henceforth) program compiled specifically for the Veracruz platform. The WASM program cannot escape its sandbox, at least not without exploiting some bug in the execution engine that the Veracruz runtime uses.

Aside from sandboxing the program, we also tightly control its capabilities. Specifically, the only capabilities that the WASM program can exploit are those explicitly granted to it, by the Veracruz runtime, and the program is given very few by Veracruz. More will be said on this later in this section, and in What is the Veracruz programming model?

Why is P provisioned first?

Suppose, as a precondition to enrolling in a computation, that the data owners demand that the program owner declassifies their program, P, before the computation starts. What then stops the program owner from showing the data owners one program, and then provisioning another program into the trusted Veracruz runtime?

To prevent this, we require that the program owner provisions their secret first. Then, the trusted Veracruz runtime measures the provisioned program by computing a hash of its binary. This hash can then be requested, by the data owners, before they choose to continue to provision their secrets into the container, and compared against a reference hash derived from the program, P.

Note, even in cases where the program, P, is not declassified by the program owner this measurement of the loaded program may be useful. For example, the data owners may have an interest in ensuring that a known-good program, even if it cannot be audited directly, has been loaded into the container perhaps as part of a series of collaborative computations on different data sets. The data owners (and result receivers) have a natural interest in ensuring that the same program is used to transform input data, even if they do not know what it is.

Isn't running a secret program on data strange?

No, not at all — it's very common, in fact. Consider just one example: ancestry testing services which process cheek-swab data and extract genetic information, deriving ancestry data. There are many of these companies, and at heart they merely extract data and run a proprietary program on it, which the customer never sees. What does the program do? Does it have bugs? Is it anything other than a random ancestry generator? Customers in general do not know as they cannot audit the code. Yet they still use it, on particularly sensitive data too!

More commonly, any closed-source code is — for all practical reasons — completely inscrutable to users.

After the program, the data is provisioned, right?

Yes. The data providers use the same remote attestation steps that the program owner took, to check the authenticity of the trusted Veracruz runtime. Moreover, this time they can also make reference to the measurement, as described above, of the provisioned program, P, before they choose to commit to fully enrolling in the computation.

Note that, per the global policy file, the trusted Veracruz runtime knows exactly how many input data sets are due to be provisioned, and from whom. Once all of the data has been loaded, the container's lifecycle state switches, and the program becomes ready to execute.

Why should the data owners trust the program, P?

In the general case, they should not! As a result, the data owners need to carefully vet the global policy file to ensure that they understand who gains access to the result of the computation once it completes, especially when the program, P, is not declassified. For example, a malicious data owner, conspiring with a results receiver, may load an identity program which produces as output one of the secret inputs passed to it (to properly understand how you program against Veracruz, see What is the Veracruz programming model?, thereby granting the results receiver access to the input secret. Moreover, even if the program, P, isn't so brazenly insecure as this, the output of many computations still gives away a lot of information about that computation's inputs. In many cases, especially when the program, P, is not declassified, the only sensible option may be for the data owners to also be the result receivers.

Moreover, what stops the program, P, from simply dumping all of its secret inputs to stdout on the delegate's machine? This is a legitimate question, given the fact that — generally — a data owner is allowed to assume that the program owner and the delegate may be working in cahoots to steal a secret! To stop this, Veracruz puts strict limits on the expressivity of the program that is loaded into the container. To a first approximation, a Veracruz program may only compute a pure function of its inputs, modulo sampling from a source of random data. As a result, the only side-effect of a program compiled for the Veracruz platform is its result: it cannot open files on the delegate's machine; it cannot directly control hardware on the delegate's machine; it cannot sample from any data source other than the inputs passed to it by the data owners, other than a random data source. Note that, aside from the data owners, this limitation is also necessary to protect the delegate's machine, too.

In future versions of Veracruz, the trusted runtime may supply additional services aside from a random number source to the program, P. For instance, the runtime providing a cryptographic API to the program may make some use-cases easier to develop. For more, see What's next for Veracruz? for more details of our aspirations for future Veracruz features.

Why does the Veracruz runtime have a lifecycle state?

The trusted Veracruz runtime is stateful, in the sense that there's an explicit lifecycle state, with a defined set of state transitions between lifecycle states that trigger once e.g. the program is loaded. The runtime's current lifecycle state can be queried by anybody party to the computation, at any point.

There's a few reasons why the trusted Veracruz runtime is stateful. First, as discussed above, it's important for a Veracruz computation that the program, P, is loaded before the data sets are loaded. An explicit state machine enforces this property, and the defined state transitions ensure that once a program is loaded, and data therefore becomes loadable into the container, the program owner cannot load another program thereby tricking the data owners into thinking the computation will use one program instead of another. Similarly, the Veracruz runtime also needs to make sure that the program, P, only becomes executable when all expected data sets have been loaded by the data owners. In short: the defined lifecycle states in the trusted Veracruz runtime allow us to maintain a number of important system invariants that help ensure the correctness of Veracruz.

In addition to this, the runtime's state machine prevents some probing attacks, wherein a malicious data set owner repeatedly modifies their data set and runs a computation over-and-over again, observing differences in the output, to deduce information about the program, P, or similar. To stop this, we adopt an important invariant in Veracruz: a computation is only ever executed when the owners of the program and the input data sets explicitly consent to this. Here, we take a program or data owner's provisioning of their secret as their consent to the computation. As a result, every computation, even ones using the same program or data sets as a previous one on a given platform, requires an explicit provisioning act by the program and data owners. No computation can be endlessly re-run, over and over, perhaps with minor modification of data or program, without the owners being made aware of this and actively consenting. Under this scheme, probing attacks are still potentially possible, but they require the victim to be actively complicit in the attack.

Lastly, the lifecycle state is sometimes useful when designing collaborative computations, as we can use details of the state transition diagram to definitively know that a party to a computation has committed to some decision, safe in the knowledge that Veracruz will prevent that commitment from being reneged on. An example will be discussed in What are some use cases for Veracruz?

Everything's provisioned. Now what?

The trusted Veracruz runtime is now in a "ready to execute" lifecycle state. As a result, one of the principals with the Results Receiver role can go ahead and ask the runtime to execute the program, and in response the trusted Veracruz runtime starts executing the WASM program, P.

The program, P, will either diverge, or finish executing and therefore producing a result, R = P(D1, ..., DN). Once this happens, all of the principals who are authorised to receive R, per the global policy file, are permitted to request it, and the runtime will forward it to them.

At the time of writing, Veracruz offers two execution strategies when executing the WASM program, P: interpretation and just-in-time compilation (or JITing, henceforth). Naturally — at least in the limit — JITing exhibits a large performance boost when compared to interpretation, so why bother supporting both execution strategies? There's two reasons for this:

  1. First, for computations on very small datasets, interpretation may actually be faster than JITing, due to the overhead of the JIT compiler walking the program's AST and translating it into machine code before execution can begin.
  2. JIT compilers perform all sorts of tricks with memory, notably writing machine code into memory pages and thereafter making them executable. As a result, JITs could make an attractive target for malicious users to try to exploit. Depending on the participant's level of paranoia, they may therefore opt to interpret the program, P, rather than JIT it, for security reasons.

As a result of supporting multiple execution strategies, the particular strategy that Veracruz will use is specified in the global policy file, and therefore everybody party to a computation is aware, before enrolling in the computation, how the program will be executed. At the moment, not every containerisation technology supports both execution strategies: only the seL4 hypervisor is capable of supporting both interpretation and JITing, but we are actively working on extending this support to our other platforms. Note that, with seL4, we are able to ameliorate potential security problems with JITs, as discussed in point (2) above, somewhat, by isolating the JIT from the rest of the trusted Veracruz runtime, using process-level isolation offered by seL4. Now, an attacker trying to exploit a security problem in a JIT not only has to break the JIT, but they also have to break seL4's isolation mechanisms, too.

Is that the computation finished?

Not quite: we need to consider teardown of the container. Veracruz prevents a second computation reusing a container that a previous computation used as its "venue". Rather, the container, an SGX Secure Enclave for instance, must be explicitly destroyed: this is another reason why the trusted Veracruz runtime is stateful, and has an explicit state transition system.

The reason for this is to try to mask microarchitectural effects, such as a computation's effect on the cache or various microarchitectural buffers, left over from a previous computation that could be observed by a subsequent one.

We assume that the delegate destroys the container once the computation is finished.

In future versions of Veracruz, this restriction may be relaxed, in which case this fact will be reflected in the global policy file.

Clone this wiki locally