Skip to content

Cranelift: make load-sinking less implicit in the face of type widening #12477

@cfallin

Description

@cfallin

In CVE-2026-24116, we saw that a load-sinking instruction lowering on x86-64 could result in a 64-bit load (load.f64 Cranelift operator) becoming a memory operand on a 128-bit-wide SIMD instruction (vandpd / vandnpd). This resulted in a too-wide access to memory, hence a miscompilation and a CVE. (The issue also applies to 32-bit loads for f32.copysign equally, but we'll describe the 64-bit case only here for brevity.)

The chain of logic that leads to the load-sinking is:

  1. The fcopysign lowering takes Values and feeds them to the helper constructors for vandpd and vandnpd [here]. These instructions perform operations on 128-bit XMM registers, and we rely on Cranelift's usual convention of storing narrower values (here an f64) in wider machine registers and ignoring the upper bits.

  2. These constructors take XmmMem arguments, which can represent a register or memory; when memory, the instructions perform a 128-bit load

  3. XmmMem can be implicitly constructed from a Value by put_in_xmm_mem, which checks for a sinkable load [here]

  4. A load is sinkable if [conditions] are met that relate to code-motion, but the width of the to-be-sunk load operator is not explicitly checked against anything, because we do not know yet who will be consuming (incorporating) the sunk load.

In other words, we use a Value corresponding to an f64 with an instruction that operates on XMM registers, and we sink loads into XMM operands. Both of these patterns occur in many places: e.g., an fadd lowers to an instruction that operates on XMM registers, and can sink a load, as well. The thing that makes fcopysign special is that its lowering uses instructions that operate on the full 128 bits while dealing with f64s; e.g., fadd.f64 uses addsd (add single double) which, if using a memory operand, will load only 64 bits.

The tricky bit here is that nothing about the lowering rule itself looks amiss -- it is essentially (simplified)

(rule (lower (has_type $F64 (fcopysign a b)))
      (simd_op
        (simd_op a)
        (simd_op b)))

and only knowledge that (i) a and b are f64s and (ii) the simd_ops do 128 bits of work leads to seeing the bug.

There are a few ways we could "break the chain" above and cause this mistake to be harder to make:

  1. We could remove the implicit conversion from Value to XmmMem, i.e., force all sites where we want to support load-sinking to opt into that behavior.

    This is tempting from a point-of-view of explicitness and clarity when reading/auditing the code. It should be noted, however, that the ergonomics/implicitness of autoconversions go hand-in-hand with completeness, i.e., supporting load-sinking everywhere it is possible. In essence, this is admitting that the types don't fully capture the invariants, so we can't trust the type system, and we have to manually describe allowed behavior. As we know from experience with Rust, a type system that has accurate guardrails is freeing in many ways and allows for more optimizations.

    With @avanhatt and @fitzgen we did a quick experiment to see how many sites would need explicit opt-in or opt-out if we remove the Value to XmmMem-and-family conversions; 255 type errors resulted. That's a lot, but within the realm of a big refactor.

  2. We could keep the implicit conversions but somehow carry the width of the original value dynamically in the XmmMem (or more likely the underlying Amode), and assert that the instruction consuming the Amode will do a load of the expected width.

    This is also tempting, albeit with some dynamic cost (even if the assert itself is only a debug-assert, the extra field rides along in the Amode), and also relies on fuzzing rather than the type system to find errors.

  3. We could add another dimension to the newtype universe around registers, and have XmmMem32, XmmMem64, and XmmMem128; update instruction constructors to take only the appropriate width; and create autoconversions from Value to each of these that dynamically fails if the Value has a CLIF-level type with the wrong width.

    This is better than option 2 above because, though still dynamic, it puts more in the type system; and this is desirable for verification.

  4. (Idea from @avanhatt) We could mark the left-hand side bindings where sinking is acceptable syntactically, and then only for these, provide autoconversions; e.g., match on (fcopysign (maybe_sink a) (maybe_sink b)) where maybe_sink gives us a ValueMaybeSunk and we have an autoconverter from ValueMaybeSunk to XmmMem (and then from Value only to Xmm, i.e. only the register form).

    This is nice because it addresses one difference between load-sinking cases and all other instruction selection rules: this is the only case where we don't write out the full pattern of CLIF operators that we are matching on and lowering on the left-hand side. In a sense, the current status quote is composing the usual LHS patterns with the loads-become-memory-operands sinking behavior, which allows for more compact and complete rulesets (we'd otherwise have to duplicate many of the backend rules in x64, and we might miss some) but hides this behavior.

I am starting to come to the opinion that option 4 is good regardless -- the discrepancy between load-sinking and all other kinds of multi-operator matching on left hand sides is odd -- but we should also try to do something like option 3, i.e. distinguish operands of different widths at the type level, because option 4 alone (denoting sinking sites) still relies on the rule author reasoning about operand width, and possibly getting it wrong (or a later refactor getting it wrong). To make it concrete with details from this CVE, something would have needed to know that vandpd has a 128-bit operand width, and that does not match the Value; that something is either the author reading the Intel manual, deciding yes or no, and adding maybe_sink, or types added in a principled way on both ends.

cc @avanhatt @fitzgen for thoughts!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions