-
Notifications
You must be signed in to change notification settings - Fork 1
[DRAFT]: Complex support #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great start, I think num_complex is a good choice of library to provide some level of interoperability with.
I've left a couple comments of some things that are probably worth discussion, I think the orphan rules stuff is a bit of a pain, but unavoidable.
cfavml-complex/src/lib.rs
Outdated
| unsafe fn swap_complex_components(value: Self::Register) -> Self::Register; | ||
| } | ||
|
|
||
| pub struct Avx2Complex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess orphan rules are causing issues using the main exported types :/ Maybe the wrapper impl should be like Complex<RegisterType> so Complex<Avx2> etc...
Edit: :/ but then that conflicts with the num complex wrapper type so I'm not sure what the best thing to do is
cfavml-complex/src/lib.rs
Outdated
| <Avx2Complex as ComplexOps<f32>>::swap_complex_components(right); | ||
|
|
||
| let output_right = _mm256_mul_ps(left_imag, right_shuffled); | ||
| _mm256_fmaddsub_ps(left_real, right, output_right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think although it might be a bit of a PITA is this is likely going to need a Avx2 and Avx2Fma impl, technically within CFAVML, Avx2 assumes only avx2 is enabled with no FMA support. Although unlikely, for consistency we should make sure all the crates follow this pattern as well.
Internally what CFAVML does is on any routine which doesn't actually need FMA, it just calls the Avx2::example_op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A crate that might be useful is duplicate.
I explored writing a macro rule to implement functions that just called the same function name from another struct, but I couldn't figure out how to pass in the fn call as a raw string to use it in both the definition and the inner call. It started to feel like one of those "If I can, maybe I shouldn't" problems.
|
@ChillFish8 So operations that rely on ordering aren't exactly valid for complex numbers, so (min,max,lt/gt). I think equality is still a valid operation. Would it make sense (either in this PR or a separate one) to split the ordering ops out of the SimdRegister Trait? |
|
Hmm, I think it might be better to have the ordering methods simply implement I suspect most people working with the complex numbers are unlikely to reach for the ordering operations, and IMO a panic is fine providing it is well documented that complex impls explicitly don't support the cmp ops. Worth noting that you can control the public API methods like |
|
I think I'll have to duplicate the structure for the math module. I can't correctly implement the trait directly due to the norm function being For complex ops, there are a few operations that would be useful to have full register and half register variants. I'm not sure what the naming convention should be though. I have the full register ones starting with |
Hmm If this is a limitation we can probably look to alter the math trait a bit to make it more flexible. I confess it was a bit of a hack more than anything specialized so happy to change it. |
Good question, I am slightly inclined to go with |
|
hey dumb question, assuming I have a compatible cpu, how do I test the specific instruction sets in danger test_suite? tried running |
|
Not a dumb question :) Most of the test suite tests specific features gated by the actual global target_features flag. Normally easiest method is to run with |
|
Have a look at the current test suite: https://github.com/ChillFish8/cfavml/blob/main/cfavml/src/danger/test_suite.rs#L389C1-L484C2 Notice it uses the |
|
I'm working on a hypotenuse function for avoiding overflow/underflow for certain values. It's currently a mess so I'll just the simpler (working) version here: ///|b| * sqrt(1 + (a/b)^2)
#[inline(always)]
unsafe fn safer_hypot(left: __m256, right: __m256) -> __m256 {
let (left_abs, right_abs) = (
_mm256_andnot_ps(_mm256_set1_ps(-0.0), left),
_mm256_andnot_ps(_mm256_set1_ps(-0.0), right),
);
let (a, b) = (max(left_abs, right_abs), min(left_abs, right_abs));
let ab = _mm256_div_ps(a, b);
_mm256_mul_ps(
b,
_mm256_sqrt_ps(_mm256_fmadd_ps(ab, ab, _mm256_set1_ps(1.0))),
)
}there's a bit more involved version I'm trying to adapt from here. I was wondering if I should try to put that in the cfavml crate itself in a separate PR, given it's a scalar op that is probably useful outside of norming complex numbers. |
|
Hmm probably not a bad idea, if you'd like to submit a PR feel free to do so, otherwise create a new issue so it doesn't get lost 👍 |
Mainly opening a draft to get feedback on the design. I'm still figuring out what needs to be implemented and what the api should look like. I've started with duplicating the implementations for AVX for
Complex<f32>andComplex<f64>.A lot of the element-wise code will be exactly the same as the regular float type implementations. There are a few functions that would be useful to both build the some of the methods and expose to users (so far just a few helpers from rustfft), Which I've put in a separate trait.
progress on this is likely to be really slow for a while, I'm kind of learning the simd intrinsics stuff as I go.
will resolve #4