-
Notifications
You must be signed in to change notification settings - Fork 1
Add cell variants for generic load/store #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In the Rust type system these are fundamentally different from normal references, in that they may be aliased by other pointers. However, the load and store instructions do not care about this as they operate on memory and these intrinsics do not enforce aliasing. This even applies to storeu2 which are macros around a set of two stores of size u128.
I read some of the discussion and the high level idea of doing things in-place is compelling, so if it solves your use cases I'd be happy to add it. However, I don't use The load functions seem fine but I'm wondering about the usefulness of store variants: From my memory, Quick sketch of what I meanuse std::arch::x86_64::{self as arch, __m128i};
use std::cell::Cell;
use std::ptr;
// Boilerplate, approximating this crate's traits
mod private {
pub trait Sealed {}
}
impl private::Sealed for [Cell<i32>; 4] {}
pub trait Is128BitsUnaligned: private::Sealed {}
impl Is128BitsUnaligned for [Cell<i32>; 4] {}
fn main() {
let mut x = [1i32, 2, 3, 4, 5, 6, 7, 8];
test(&mut x);
dbg!(x); // x = [ 3, 5, 7, 9, 5, 6, 7, 8, ]
}
// Add [0..4] with [1..5], store in [0..4]
fn test(x: &mut [i32]) {
let cell = Cell::from_mut(x);
let cells = cell.as_slice_of_cells();
let cell1: &[Cell<i32>; 4] = cells[..][..4].try_into().unwrap();
let cell2: &[Cell<i32>; 4] = cells[1..][..4].try_into().unwrap();
let a = unsafe { _mm_loadu_si128_cell(cell1) };
let b = unsafe { _mm_loadu_si128_cell(cell2) };
let c = unsafe { arch::_mm_add_epi32(a, b) };
unsafe { _mm_storeu_si128_cell(cell1, c) };
}
#[target_feature(enable = "sse2")]
pub fn _mm_loadu_si128_cell<T: Is128BitsUnaligned>(mem_addr: &T) -> __m128i {
unsafe { arch::_mm_loadu_si128(ptr::from_ref(mem_addr).cast()) }
}
// Note this implementation is different.
// I think `cast_mut` is safe here because we originally derived it from the mutable input slice,
// passes Miri on the playground
#[target_feature(enable = "sse2")]
pub fn _mm_storeu_si128_cell<T: Is128BitsUnaligned>(mem_addr: &T, a: __m128i) {
unsafe { arch::_mm_storeu_si128(ptr::from_ref(mem_addr).cast_mut().cast(), a) }
} Aside/bikeshed thoughts after the above question is resolved, noting it for later me:
|
As written in Cell::swap:
The standard library will probably create such methods eventually. For now, it is sound and miri certifiable to convert the slice from |
This allows us to work on shared references of different types of cell wrapped memory, importantly `Cell<[T; N]>` and `[Cell<T>; N]` will both work with the cell interfaces.
I've basically switched to your implementation sketch with some additions to make sure both kinds of arrays ( |
Ah, I didn't know that. Hadn't scrolled up to see that in the tracking issue.
Right, I forgot about Really happy with the new changes you've pushed. I have some review comments to make but otherwise it looks almost good to go. If you want, you can bump the version to 0.1.2 and update the readme referencing this PR and title following the 0.1.1 format. |
I want to make another pass, making sure that all the relevant interfaces are covered and have tests, etc. There's no need to rush it and a few more things on my plate. Should be done within this week though. |
Sounds good to me. The current tests only cover the Also if you could update the PR message to add a line about whatever this PR ended up being. For example,
Experience/FeedbackI was playing with the new interface and it's surprisingly ergonomic. Two other thoughts I had:
|
In the context of image this is not the most pressing issue. Since we have TL;dr you don't need to solve all ergonomic issues within this crate, make sure to focus on the core issue of architecture support. |
Thanks for this. I see you've updated the PR message too, so everything is covered. |
Add Is128CellUnaligned, Is256CellUnaligned traits for generic load/store of
&Cell<[T; N]>
and&[Cell<T>; N]
behind shared references that may overlap.As by experience from
image-canvas
and also extrapolatingmoxcms
we probably do not want to restrict these intrinsics to interactions with the Rust alias model. In particular loads and stores can also work on cells. This is sort of a minimal PR to start this discussion.In the Rust type system Cells are fundamentally different from normal references, in that they may be aliased by other pointers / other cells. However, the load and store instructions do not care about this as they operate on memory and these intrinsics do not enforce aliasing.
This even applies to storeu2 which are macros around a set of two stores of size u128.
This for instance would allow easily composing some stream operations to happen partially in-place (e.g.
a[i] = a[i] + a[i+1]
) by usingCell::from_mut
and then iterating over two different overlapping slices of cells for defining inputs and outputs.