[DRAFT]: Complex support #13

skewballfox · 2024-10-30T22:56:27Z

Mainly opening a draft to get feedback on the design. I'm still figuring out what needs to be implemented and what the api should look like. I've started with duplicating the implementations for AVX for Complex<f32> and Complex<f64>.

A lot of the element-wise code will be exactly the same as the regular float type implementations. There are a few functions that would be useful to both build the some of the methods and expose to users (so far just a few helpers from rustfft), Which I've put in a separate trait.

progress on this is likely to be really slow for a while, I'm kind of learning the simd intrinsics stuff as I go.

will resolve #4

ChillFish8

Looks like a great start, I think num_complex is a good choice of library to provide some level of interoperability with.

I've left a couple comments of some things that are probably worth discussion, I think the orphan rules stuff is a bit of a pain, but unavoidable.

ChillFish8 · 2024-10-31T21:47:21Z

cfavml-complex/src/lib.rs

+    unsafe fn swap_complex_components(value: Self::Register) -> Self::Register;
+}
+
+pub struct Avx2Complex;


I guess orphan rules are causing issues using the main exported types :/ Maybe the wrapper impl should be like Complex<RegisterType> so Complex<Avx2> etc...

Edit: :/ but then that conflicts with the num complex wrapper type so I'm not sure what the best thing to do is

ChillFish8 · 2024-10-31T21:49:25Z

cfavml-complex/src/lib.rs

+            <Avx2Complex as ComplexOps<f32>>::swap_complex_components(right);
+
+        let output_right = _mm256_mul_ps(left_imag, right_shuffled);
+        _mm256_fmaddsub_ps(left_real, right, output_right)


So I think although it might be a bit of a PITA is this is likely going to need a Avx2 and Avx2Fma impl, technically within CFAVML, Avx2 assumes only avx2 is enabled with no FMA support. Although unlikely, for consistency we should make sure all the crates follow this pattern as well.

Internally what CFAVML does is on any routine which doesn't actually need FMA, it just calls the Avx2::example_op

A crate that might be useful is duplicate.

I explored writing a macro rule to implement functions that just called the same function name from another struct, but I couldn't figure out how to pass in the fn call as a raw string to use it in both the definition and the inner call. It started to feel like one of those "If I can, maybe I shouldn't" problems.

skewballfox · 2024-11-01T21:03:50Z

@ChillFish8 So operations that rely on ordering aren't exactly valid for complex numbers, so (min,max,lt/gt). I think equality is still a valid operation. Would it make sense (either in this PR or a separate one) to split the ordering ops out of the SimdRegister Trait?

ChillFish8 · 2024-11-01T21:07:41Z

Hmm, I think it might be better to have the ordering methods simply implement unimplemented!("Stop trying to order complex numbers >-<")

I suspect most people working with the complex numbers are unlikely to reach for the ordering operations, and IMO a panic is fine providing it is well documented that complex impls explicitly don't support the cmp ops.

Worth noting that you can control the public API methods like cfaml::max etc... By just not implementing the trait, so that prevent most accidental missue.

skewballfox · 2024-11-04T17:00:39Z

I think I'll have to duplicate the structure for the math module. I can't correctly implement the trait directly due to the norm function being (a: T, b: T) -> T, where norming a complex number returns the inner type. Right now I have basic ops implemented in impl_test but that's mainly because I didn't realize I'd need it for Fallback.

For complex ops, there are a few operations that would be useful to have full register and half register variants. I'm not sure what the naming convention should be though. I have the full register ones starting with duplicate or dup, maybe keep the same name omitting the prefix?

ChillFish8 · 2024-11-04T17:45:04Z

I think I'll have to duplicate the structure for the math module. I can't correctly implement the trait directly due to the norm function being (a: T, b: T) -> T, where norming a complex number returns the inner type. Right now I have basic ops implemented in impl_test but that's mainly because I didn't realize I'd need it for Fallback.

Hmm If this is a limitation we can probably look to alter the math trait a bit to make it more flexible. I confess it was a bit of a hack more than anything specialized so happy to change it.

ChillFish8 · 2024-11-04T17:46:52Z

For complex ops, there are a few operations that would be useful to have full register and half register variants. I'm not sure what the naming convention should be though. I have the full register ones starting with duplicate or dup, maybe keep the same name omitting the prefix?

Good question, I am slightly inclined to go with dup as the prefix, although I'll let you decide, I think the naming convention will probably define itself a bit more once more ops are added as it'll start becoming clear what things are tedious of confusing to write out with their current names.

skewballfox · 2024-11-06T01:03:28Z

hey dumb question, assuming I have a compatible cpu, how do I test the specific instruction sets in danger test_suite? tried running cargo test --package cfavml-complex --features avx2 but obviously that doesn't work because it's not a published feature. In cfavml it looks like it's enabled by dispatch, but that is only called when building the safe APIs

ChillFish8 · 2024-11-06T22:13:02Z

Not a dumb question :) Most of the test suite tests specific features gated by the actual global target_features flag. Normally easiest method is to run with RUSTFLAGS="-C target-cpu=native". We have to do it this way otherwise some things like the AVX512 test suite might try and run on CI runners without AVX512 support.

ChillFish8 · 2024-11-06T22:14:46Z

Have a look at the current test suite: https://github.com/ChillFish8/cfavml/blob/main/cfavml/src/danger/test_suite.rs#L389C1-L484C2

Notice it uses the target_features which refers to the CPU features compile time target rather than cargo features.

skewballfox · 2024-11-08T00:19:46Z

I'm working on a hypotenuse function for avoiding overflow/underflow for certain values. It's currently a mess so I'll just the simpler (working) version here:

  ///|b| * sqrt(1 + (a/b)^2)
#[inline(always)]
unsafe fn safer_hypot(left: __m256, right: __m256) -> __m256 {
    let (left_abs, right_abs) = (
        _mm256_andnot_ps(_mm256_set1_ps(-0.0), left),
        _mm256_andnot_ps(_mm256_set1_ps(-0.0), right),
    );
    let (a, b) = (max(left_abs, right_abs), min(left_abs, right_abs));
    let ab = _mm256_div_ps(a, b);
    _mm256_mul_ps(
        b,
        _mm256_sqrt_ps(_mm256_fmadd_ps(ab, ab, _mm256_set1_ps(1.0))),
    )
}

there's a bit more involved version I'm trying to adapt from here. I was wondering if I should try to put that in the cfavml crate itself in a separate PR, given it's a scalar op that is probably useful outside of norming complex numbers.

ChillFish8 · 2024-11-08T00:23:26Z

Hmm probably not a bad idea, if you'd like to submit a PR feel free to do so, otherwise create a new issue so it doesn't get lost 👍

skewballfox added 2 commits October 30, 2024 16:05

some initial scaffolding for complex support

159ad9d

added trait for complex specific operations

4001fae

skewballfox marked this pull request as draft October 31, 2024 13:27

ChillFish8 reviewed Oct 31, 2024

View reviewed changes

skewballfox added 3 commits November 1, 2024 18:56

fleshing out avx2 ops

f0e1d47

added ops for avx2/avx2fma

1dd9bbb

reorganize cfavml-complex crate, test implementation in progress

1dc65d7

implemented fallback for complex-cfavml

b7d41ea

separated out complex simd ops, avx2/fma currently passing test

c9e2c3d

skewballfox mentioned this pull request Nov 9, 2024

hypot function in cfavml #14

Open

rename dup_squared_norm to dup_norm and use hypot internally

44cfaad

[DRAFT]: Complex support #13

Are you sure you want to change the base?

[DRAFT]: Complex support #13

Uh oh!

Conversation

skewballfox commented Oct 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChillFish8 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChillFish8 Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChillFish8 Oct 31, 2024

Choose a reason for hiding this comment

Uh oh!

skewballfox Nov 2, 2024

Choose a reason for hiding this comment

Uh oh!

skewballfox commented Nov 1, 2024

Uh oh!

ChillFish8 commented Nov 1, 2024

Uh oh!

skewballfox commented Nov 4, 2024

Uh oh!

ChillFish8 commented Nov 4, 2024

Uh oh!

ChillFish8 commented Nov 4, 2024

Uh oh!

skewballfox commented Nov 6, 2024

Uh oh!

ChillFish8 commented Nov 6, 2024

Uh oh!

ChillFish8 commented Nov 6, 2024

Uh oh!

skewballfox commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChillFish8 commented Nov 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

skewballfox commented Oct 30, 2024 •

edited

Loading

ChillFish8 left a comment •

edited

Loading

ChillFish8 Oct 31, 2024 •

edited

Loading

skewballfox commented Nov 8, 2024 •

edited

Loading