-
-
Notifications
You must be signed in to change notification settings - Fork 309
Description
I was reading this today https://zeux.io/2025/05/03/load-store-conflicts/ and it's a good read
and I quote
these happening as a result of the code that explicitly tries to load or store mismatched element sizes. For example, this problem is prevalent, and requires a lot of care, when unions are used to operate on tagged values: something like this may often hit a case where the structure is filled using individual field writes, but is copied with a 128-bit wide load/store, which may present challenges in certain high-performance interpreters that would often expect the same value to be written and read in quick succession
Regular structure copies may sometimes hit this problem as well, although these are often less latency sensitive. The index decompression code discussed here is an interesting scenario where all individual accesses in the source code are matched precisely, but the compiler may be eager to combine multiple loads and stores together - and unless it combines the stores, the combined loads may suffer.
The problem is clear but I find it hard to find a proper problem that will demonstrate this problem and also it‘s restricted to just a single, older (clang-16), compiler version. Maybe we can do the mismatch load and store manully to demonstrate this problem? This maybe a good candidate for a future lab.
Also there's a 4k aliasing demo here which could be used as a ref: https://github.com/Kobzol/hardware-effects/tree/master/4k-aliasing