Currently, #22672 needs to match a long chain of operations to fold bitcasts into bufferized tensor loads:
iree_codegen.load_from_buffer
amdgpu.fat_raw_buffer_cast
hal.interface.binding.subspan
However, this pattern matching is quite fragile.
A more robust solution is to push the bitcast directly onto the source memref of the iree_codegen.load_from_buffer op.
However, the current memref.reinterpret_cast operation doesn't support casting element types. Use memref.view, but it requires the memref to be contiguous.
Therefore, a cleaner solution would be to introduce a dedicated memref.bitcast (or similar).