-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
In CVE-2026-24116, we saw that a load-sinking instruction lowering on x86-64 could result in a 64-bit load (load.f64 Cranelift operator) becoming a memory operand on a 128-bit-wide SIMD instruction (vandpd / vandnpd). This resulted in a too-wide access to memory, hence a miscompilation and a CVE. (The issue also applies to 32-bit loads for f32.copysign equally, but we'll describe the 64-bit case only here for brevity.)
The chain of logic that leads to the load-sinking is:
-
The
fcopysignlowering takesValues and feeds them to the helper constructors forvandpdandvandnpd[here]. These instructions perform operations on 128-bit XMM registers, and we rely on Cranelift's usual convention of storing narrower values (here anf64) in wider machine registers and ignoring the upper bits. -
These constructors take
XmmMemarguments, which can represent a register or memory; when memory, the instructions perform a 128-bit load -
XmmMemcan be implicitly constructed from aValuebyput_in_xmm_mem, which checks for a sinkable load [here] -
A load is sinkable if [conditions] are met that relate to code-motion, but the width of the to-be-sunk load operator is not explicitly checked against anything, because we do not know yet who will be consuming (incorporating) the sunk load.
In other words, we use a Value corresponding to an f64 with an instruction that operates on XMM registers, and we sink loads into XMM operands. Both of these patterns occur in many places: e.g., an fadd lowers to an instruction that operates on XMM registers, and can sink a load, as well. The thing that makes fcopysign special is that its lowering uses instructions that operate on the full 128 bits while dealing with f64s; e.g., fadd.f64 uses addsd (add single double) which, if using a memory operand, will load only 64 bits.
The tricky bit here is that nothing about the lowering rule itself looks amiss -- it is essentially (simplified)
(rule (lower (has_type $F64 (fcopysign a b)))
(simd_op
(simd_op a)
(simd_op b)))and only knowledge that (i) a and b are f64s and (ii) the simd_ops do 128 bits of work leads to seeing the bug.
There are a few ways we could "break the chain" above and cause this mistake to be harder to make:
-
We could remove the implicit conversion from
ValuetoXmmMem, i.e., force all sites where we want to support load-sinking to opt into that behavior.This is tempting from a point-of-view of explicitness and clarity when reading/auditing the code. It should be noted, however, that the ergonomics/implicitness of autoconversions go hand-in-hand with completeness, i.e., supporting load-sinking everywhere it is possible. In essence, this is admitting that the types don't fully capture the invariants, so we can't trust the type system, and we have to manually describe allowed behavior. As we know from experience with Rust, a type system that has accurate guardrails is freeing in many ways and allows for more optimizations.
With @avanhatt and @fitzgen we did a quick experiment to see how many sites would need explicit opt-in or opt-out if we remove the
ValuetoXmmMem-and-family conversions; 255 type errors resulted. That's a lot, but within the realm of a big refactor. -
We could keep the implicit conversions but somehow carry the width of the original value dynamically in the
XmmMem(or more likely the underlyingAmode), and assert that the instruction consuming theAmodewill do a load of the expected width.This is also tempting, albeit with some dynamic cost (even if the assert itself is only a debug-assert, the extra field rides along in the
Amode), and also relies on fuzzing rather than the type system to find errors. -
We could add another dimension to the newtype universe around registers, and have
XmmMem32,XmmMem64, andXmmMem128; update instruction constructors to take only the appropriate width; and create autoconversions fromValueto each of these that dynamically fails if theValuehas a CLIF-level type with the wrong width.This is better than option 2 above because, though still dynamic, it puts more in the type system; and this is desirable for verification.
-
(Idea from @avanhatt) We could mark the left-hand side bindings where sinking is acceptable syntactically, and then only for these, provide autoconversions; e.g., match on
(fcopysign (maybe_sink a) (maybe_sink b))wheremaybe_sinkgives us aValueMaybeSunkand we have an autoconverter fromValueMaybeSunktoXmmMem(and then fromValueonly toXmm, i.e. only the register form).This is nice because it addresses one difference between load-sinking cases and all other instruction selection rules: this is the only case where we don't write out the full pattern of CLIF operators that we are matching on and lowering on the left-hand side. In a sense, the current status quote is composing the usual LHS patterns with the loads-become-memory-operands sinking behavior, which allows for more compact and complete rulesets (we'd otherwise have to duplicate many of the backend rules in x64, and we might miss some) but hides this behavior.
I am starting to come to the opinion that option 4 is good regardless -- the discrepancy between load-sinking and all other kinds of multi-operator matching on left hand sides is odd -- but we should also try to do something like option 3, i.e. distinguish operands of different widths at the type level, because option 4 alone (denoting sinking sites) still relies on the rule author reasoning about operand width, and possibly getting it wrong (or a later refactor getting it wrong). To make it concrete with details from this CVE, something would have needed to know that vandpd has a 128-bit operand width, and that does not match the Value; that something is either the author reading the Intel manual, deciding yes or no, and adding maybe_sink, or types added in a principled way on both ends.