Cranelift: make load-sinking less implicit in the face of type widening

In [CVE-2026-24116](https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-vc8c-j3xm-xj73), we saw that a load-sinking instruction lowering on x86-64 could result in a 64-bit load (`load.f64` Cranelift operator) becoming a memory operand on a 128-bit-wide SIMD instruction (`vandpd` / `vandnpd`). This resulted in a too-wide access to memory, hence a miscompilation and a CVE. (The issue also applies to 32-bit loads for `f32.copysign` equally, but we'll describe the 64-bit case only here for brevity.)

The chain of logic that leads to the load-sinking is:

1. The `fcopysign` lowering takes `Value`s and feeds them to the helper constructors for `vandpd` and `vandnpd` [here]. These instructions perform operations on 128-bit XMM registers, and we rely on Cranelift's usual convention of storing narrower values (here an `f64`) in wider machine registers and ignoring the upper bits.

2. These constructors take `XmmMem` arguments, which can represent a register or memory; when memory, the instructions perform a 128-bit load

3. `XmmMem` can be implicitly constructed from a `Value` by `put_in_xmm_mem`, which checks for a sinkable load [here]

4. A load is sinkable if [conditions] are met that relate to code-motion, but the width of the to-be-sunk load operator is not explicitly checked against anything, because we do not know yet who will be consuming (incorporating) the sunk load.

In other words, we use a `Value` corresponding to an `f64` with an instruction that operates on XMM registers, and we sink loads into XMM operands. Both of these patterns occur in many places: e.g., an `fadd` lowers to an instruction that operates on XMM registers, and can sink a load, as well. The thing that makes `fcopysign` special is that its lowering uses instructions that operate on the full 128 bits while dealing with `f64`s; e.g., `fadd.f64` uses `addsd` (add single double) which, if using a memory operand, will load only 64 bits.

The tricky bit here is that nothing about the lowering rule itself looks amiss -- it is essentially (simplified)

```lisp
(rule (lower (has_type $F64 (fcopysign a b)))
      (simd_op
        (simd_op a)
        (simd_op b)))
```

and only knowledge that (i) `a` and `b` are `f64`s and (ii) the `simd_op`s do 128 bits of work leads to seeing the bug.

There are a few ways we could "break the chain" above and cause this mistake to be harder to make:
          
1. We could remove the implicit conversion from `Value` to `XmmMem`, i.e., force all sites where we want to support load-sinking to opt into that behavior.
   
   This is tempting from a point-of-view of explicitness and clarity when reading/auditing the code. It should be noted, however, that the ergonomics/implicitness of autoconversions go hand-in-hand with *completeness*, i.e., supporting load-sinking everywhere it is possible. In essence, this is admitting that the types don't fully capture the invariants, so we can't trust the type system, and we have to manually describe allowed behavior. As we know from experience with Rust, a type system that has accurate guardrails is freeing in many ways and allows for more optimizations.
   
   With @avanhatt and @fitzgen we did a quick experiment to see how many sites would need explicit opt-in or opt-out if we remove the `Value` to `XmmMem`-and-family conversions; 255 type errors resulted. That's a lot, but within the realm of a big refactor.
   
2. We could keep the implicit conversions but somehow carry the width of the original value dynamically in the `XmmMem` (or more likely the underlying `Amode`), and assert that the instruction consuming the `Amode` will do a load of the expected width.
   
   This is also tempting, albeit with some dynamic cost (even if the assert itself is only a debug-assert, the extra field rides along in the `Amode`), and also relies on fuzzing rather than the type system to find errors.
   
3. We could add another dimension to the newtype universe around registers, and have `XmmMem32`, `XmmMem64`, and `XmmMem128`; update instruction constructors to take only the appropriate width; and create autoconversions from `Value` to each of these that dynamically fails if the `Value` has a CLIF-level type with the wrong width.
   
   This is better than option 2 above because, though still dynamic, it puts more in the type system; and this is desirable for verification.

4. (Idea from @avanhatt) We could mark the left-hand side bindings where sinking is acceptable syntactically, and then only for these, provide autoconversions; e.g., match on `(fcopysign (maybe_sink a) (maybe_sink b))` where `maybe_sink` gives us a `ValueMaybeSunk` and we have an autoconverter from `ValueMaybeSunk` to `XmmMem` (and then from `Value` only to `Xmm`, i.e. only the register form).
   
   This is nice because it addresses one difference between load-sinking cases and all other instruction selection rules: this is the only case where we *don't* write out the full pattern of CLIF operators that we are matching on and lowering on the left-hand side. In a sense, the current status quote is composing the usual LHS patterns with the loads-become-memory-operands sinking behavior, which allows for more compact and complete rulesets (we'd otherwise have to duplicate many of the backend rules in x64, and we might miss some) but hides this behavior.

I am starting to come to the opinion that option 4 is good regardless -- the discrepancy between load-sinking and all other kinds of multi-operator matching on left hand sides is odd -- but we should also try to do something like option 3, i.e. distinguish operands of different widths at the type level, because option 4 alone (denoting sinking sites) still relies on the rule author reasoning about operand width, and possibly getting it wrong (or a later refactor getting it wrong). To make it concrete with details from this CVE, something would have needed to know that `vandpd` has a 128-bit operand width, and that does not match the `Value`; that something is either the author reading the Intel manual, deciding yes or no, and adding `maybe_sink`, or types added in a principled way on both ends.

cc @avanhatt @fitzgen for thoughts!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cranelift: make load-sinking less implicit in the face of type widening #12477

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cranelift: make load-sinking less implicit in the face of type widening #12477

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions