Skip to content

Conversation

@skewballfox
Copy link
Contributor

@skewballfox skewballfox commented Oct 30, 2024

Mainly opening a draft to get feedback on the design. I'm still figuring out what needs to be implemented and what the api should look like. I've started with duplicating the implementations for AVX for Complex<f32> and Complex<f64>.

A lot of the element-wise code will be exactly the same as the regular float type implementations. There are a few functions that would be useful to both build the some of the methods and expose to users (so far just a few helpers from rustfft), Which I've put in a separate trait.

progress on this is likely to be really slow for a while, I'm kind of learning the simd intrinsics stuff as I go.

will resolve #4

@skewballfox skewballfox marked this pull request as draft October 31, 2024 13:27
Copy link
Owner

@ChillFish8 ChillFish8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great start, I think num_complex is a good choice of library to provide some level of interoperability with.

I've left a couple comments of some things that are probably worth discussion, I think the orphan rules stuff is a bit of a pain, but unavoidable.

unsafe fn swap_complex_components(value: Self::Register) -> Self::Register;
}

pub struct Avx2Complex;
Copy link
Owner

@ChillFish8 ChillFish8 Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess orphan rules are causing issues using the main exported types :/ Maybe the wrapper impl should be like Complex<RegisterType> so Complex<Avx2> etc...

Edit: :/ but then that conflicts with the num complex wrapper type so I'm not sure what the best thing to do is

<Avx2Complex as ComplexOps<f32>>::swap_complex_components(right);

let output_right = _mm256_mul_ps(left_imag, right_shuffled);
_mm256_fmaddsub_ps(left_real, right, output_right)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think although it might be a bit of a PITA is this is likely going to need a Avx2 and Avx2Fma impl, technically within CFAVML, Avx2 assumes only avx2 is enabled with no FMA support. Although unlikely, for consistency we should make sure all the crates follow this pattern as well.

Internally what CFAVML does is on any routine which doesn't actually need FMA, it just calls the Avx2::example_op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A crate that might be useful is duplicate.

I explored writing a macro rule to implement functions that just called the same function name from another struct, but I couldn't figure out how to pass in the fn call as a raw string to use it in both the definition and the inner call. It started to feel like one of those "If I can, maybe I shouldn't" problems.

@skewballfox
Copy link
Contributor Author

@ChillFish8 So operations that rely on ordering aren't exactly valid for complex numbers, so (min,max,lt/gt). I think equality is still a valid operation. Would it make sense (either in this PR or a separate one) to split the ordering ops out of the SimdRegister Trait?

@ChillFish8
Copy link
Owner

Hmm, I think it might be better to have the ordering methods simply implement unimplemented!("Stop trying to order complex numbers >-<")

I suspect most people working with the complex numbers are unlikely to reach for the ordering operations, and IMO a panic is fine providing it is well documented that complex impls explicitly don't support the cmp ops.

Worth noting that you can control the public API methods like cfaml::max etc... By just not implementing the trait, so that prevent most accidental missue.

@skewballfox
Copy link
Contributor Author

I think I'll have to duplicate the structure for the math module. I can't correctly implement the trait directly due to the norm function being (a: T, b: T) -> T, where norming a complex number returns the inner type. Right now I have basic ops implemented in impl_test but that's mainly because I didn't realize I'd need it for Fallback.

For complex ops, there are a few operations that would be useful to have full register and half register variants. I'm not sure what the naming convention should be though. I have the full register ones starting with duplicate or dup, maybe keep the same name omitting the prefix?

@ChillFish8
Copy link
Owner

I think I'll have to duplicate the structure for the math module. I can't correctly implement the trait directly due to the norm function being (a: T, b: T) -> T, where norming a complex number returns the inner type. Right now I have basic ops implemented in impl_test but that's mainly because I didn't realize I'd need it for Fallback.

Hmm If this is a limitation we can probably look to alter the math trait a bit to make it more flexible. I confess it was a bit of a hack more than anything specialized so happy to change it.

@ChillFish8
Copy link
Owner

For complex ops, there are a few operations that would be useful to have full register and half register variants. I'm not sure what the naming convention should be though. I have the full register ones starting with duplicate or dup, maybe keep the same name omitting the prefix?

Good question, I am slightly inclined to go with dup as the prefix, although I'll let you decide, I think the naming convention will probably define itself a bit more once more ops are added as it'll start becoming clear what things are tedious of confusing to write out with their current names.

@skewballfox
Copy link
Contributor Author

hey dumb question, assuming I have a compatible cpu, how do I test the specific instruction sets in danger test_suite? tried running cargo test --package cfavml-complex --features avx2 but obviously that doesn't work because it's not a published feature. In cfavml it looks like it's enabled by dispatch, but that is only called when building the safe APIs

@ChillFish8
Copy link
Owner

Not a dumb question :) Most of the test suite tests specific features gated by the actual global target_features flag. Normally easiest method is to run with RUSTFLAGS="-C target-cpu=native". We have to do it this way otherwise some things like the AVX512 test suite might try and run on CI runners without AVX512 support.

@ChillFish8
Copy link
Owner

Have a look at the current test suite: https://github.com/ChillFish8/cfavml/blob/main/cfavml/src/danger/test_suite.rs#L389C1-L484C2

Notice it uses the target_features which refers to the CPU features compile time target rather than cargo features.

@skewballfox
Copy link
Contributor Author

skewballfox commented Nov 8, 2024

I'm working on a hypotenuse function for avoiding overflow/underflow for certain values. It's currently a mess so I'll just the simpler (working) version here:

  ///|b| * sqrt(1 + (a/b)^2)
#[inline(always)]
unsafe fn safer_hypot(left: __m256, right: __m256) -> __m256 {
    let (left_abs, right_abs) = (
        _mm256_andnot_ps(_mm256_set1_ps(-0.0), left),
        _mm256_andnot_ps(_mm256_set1_ps(-0.0), right),
    );
    let (a, b) = (max(left_abs, right_abs), min(left_abs, right_abs));
    let ab = _mm256_div_ps(a, b);
    _mm256_mul_ps(
        b,
        _mm256_sqrt_ps(_mm256_fmadd_ps(ab, ab, _mm256_set1_ps(1.0))),
    )
}

there's a bit more involved version I'm trying to adapt from here. I was wondering if I should try to put that in the cfavml crate itself in a separate PR, given it's a scalar op that is probably useful outside of norming complex numbers.

@ChillFish8
Copy link
Owner

Hmm probably not a bad idea, if you'd like to submit a PR feel free to do so, otherwise create a new issue so it doesn't get lost 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Complex Number Support?

2 participants