Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sections talking about Swift SIMD types and interop with them #313

Merged
merged 6 commits into from
Jan 29, 2024

Conversation

jkoritzinsky
Copy link
Member

Swift's SIMD types are named/categorized along a different axis (number of elements) than the .NET types (vector width). Additionally, .NET historically has a few vector types that were defined based on number of elements for single-precision floating point vectors that have generally had poor support in interop scenarios.

These sections discuss both of these problems and how I believe we should solve them for our Swift interop experience.

proposed/swift-interop.md Outdated Show resolved Hide resolved
@@ -118,7 +118,13 @@ When calling a function that returns an opaque struct, the Swift ABI always requ

At the lowest level of the calling convention, we do not consider Library Evolution to be a different calling convention than the Swift calling convention. Library Evolution requires that some types are passed by a pointer/reference, but it does not fundamentally change the calling convention. Effectively, Library Evolution forces the least optimizable choice to be taken at every possible point. As a result, we should not handle Library Evolution as a separate calling convention and instead we can manually handle it at the projection layer.

For frozen structs and enums, Swift has a complicated lowering process where the struct or enum type's layout are recursively flattened to a sequence of primitives. If this sequence is length 4 or less, the values of this type are split into the elements of this sequence for parameter passing instead of passing the struct as a whole. Structs and enums that cannot be broken down in this way are passed by-reference to their specified frozen layout. Due to high implementation cost in the RyuJIT, in particular in the `UnmanagedCallersOnly` scenario, we should implement this first pass of lowering in the projection layer; the only types allowed for `CallConvSwift` calling convention in method or function pointer signatures are primitives, our special Swift register types, and pointer types. For reference, this lowering pass is done in the Swift compiler when lowering from Swift IL to LLVM IR. This design decision reinforces our direction of having the Runtime layer of Swift interop support similar features as the LLVM IR representation of Swift.
For frozen structs and enums, Swift has a complicated lowering process where the struct or enum type's layout are recursively flattened to a sequence of primitives. If this sequence is length 4 or less, the values of this type are split into the elements of this sequence for parameter passing instead of passing the struct as a whole. Structs and enums that cannot be broken down in this way are passed by-reference to their specified frozen layout. When a frozen struct or enum with a valid primitive sequence of 4 elements or less is returned from a function, it is returned if it were a structure of the elements of the primitive sequence. Due to high implementation cost in the RyuJIT, in particular in the `UnmanagedCallersOnly` scenario, we should implement this first pass of lowering in the projection layer. The only types allowed for `CallConvSwift` calling convention in method or function pointer parameters are primitives, our special Swift register types, and pointer types. In return types, we will also allow structure types to support returning the primitive type sequences correctly. For reference, this lowering pass is done in the Swift compiler when lowering from Swift IL to LLVM IR. This design decision reinforces our direction of having the Runtime layer of Swift interop support similar features as the LLVM IR representation of Swift.
Copy link
Member

@jkotas jkotas Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this match how LLVM deals with it? Are arguments handled in Swift IL lowering, but return values left to codegen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both parameters and return values are inspected in Swift IL lowering and lowered to primitive type sequences of 4 or less primitives if possible. Parameters that are lowered to this sequence are passed as separate parameters, one for each element of the sequence. If there is a valid primitive sequence for the return type, the actual return type in LLVM IR is a struct of the elements of the type sequence, not the original struct type. Processing this struct into which registers to return it through or return it by a return buffer is then handled by LLVM.

I've validated this by looking at the IR emitted by the Swift compiler on Compiler Explorer: https://godbolt.org/z/o1h6Y5de8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason for the different handling of the return values vs. arguments in Swift/LLVM toolchain? Does this difference show up in the Swift public surface or is it just an internal implementation detail of the Swift toolchain that can change in future without breaking the Swift ABI?

It looks weird to standardize the different handling of the return values vs. arguments in public surface. On the other hand, we should be able to add the struct handling for arguments in future if needed, without breaking anything. So I guess it is ok to start with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only way that Swift could implement the "lower to a type sequence of primitives" consistently for return values and parameters while still allowing enregistering return values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be the only way? They could have done all lowering in the LLVM codegen part as part of Swift calling convention handling. Is there anything fundamental preventing that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, the algorithm you are describing here is part of the ABI: https://github.com/apple/swift/blob/d1d9fd1a2e478189e6eec7c48a0b952d9063859b/docs/ABI/CallingConvention.rst#L926-L993
It sounds like we are going to end up with ABI specific handling within the projection tooling regardless of whether we handle structs within the runtime or not. If that is the case, should we go with the tried-and-tested LLVM approach until we have a good grasp around the exact details of this handling and are confident that it all maps reasonably well to structs that can be described in IL? Are we already confident enough about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding now is that we are going to be running the first part of the ABI twice. We need to run it in projection tooling regardless, because it is necessary for types that cannot be represented in IL. It is going to result in a struct of primitives. We are then also running the algorithm in the runtime, once more, and hoping that the results of running the first part of the algorithm twice is the same as what Swift+LLVM end up implementing.

What is the benefit of this compared to doing what Swift+LLVM does and avoiding wrapping the primitive sequence in a struct unless necessary (returns)?
I can see one benefit, which is that Swift types that are directly representable in C# can be defined in C# and used directly in interop. However, because there are Swift types that cannot be represented in C# the general guidance is always going to be to use the projection tooling.

Am I understanding this correctly? Does the diversion from Swift+LLVM make sense in this light?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another example that I'm going to reference below: https://godbolt.org/z/Y8Yxvdc3W

Here's how I view it:

The projection will handle the struct/enum layout. So it will lower X in the example above to something like:

[LayoutKind(Explicit)]
 struct X
{
    [FieldOffset(0)]
    private Foo f;
    
    private struct B
    {
         private Bar b;
         private int i;
         private int descriminator;
    }

    [FieldOffset(0)]
    private B b;
}

The projection layer does not need to lower X to a primitive type sequence, it just needs to determine the layout that Swift uses to represent each case and the descriminator.

Then the JIT/VM would handle lowering the X struct to a primitive type sequence.

Basically, the CallConvSwift signatures will always be able to use named types like Swift, and the JIT will handle all of the primitive type sequence logic and the register allocation logic in a combined pass that better fits RyuJITs architecture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the JIT/VM would handle lowering the X struct to a primitive type sequence.

How exactly would it do this? Are you confident that the intermediate results during the ABI handling of enums can always be described with structs in this way, and that running the "primitive type sequencing" algorithm on these structs will result in the right thing? Or are we expecting that we are going to reconstruct the Swift source of truth on the runtime side and trying to give all Swift types an IL representation that the runtime knows how to parse?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very confident that with the struct layout mechanisms that exist in .NET, we can construct a C# struct type with a matching layout for any frozen enum type from a Swift source of truth, especially since 32-bit targets don't need to support Swift interop, and that we can build the tooling in a way that the VM/JIT's primitive type sequencing algorithm will end up with the same results.

proposed/swift-interop.md Outdated Show resolved Hide resolved

##### SIMD Types

We will pass the `System.Runtime.Intrinsics.VectorX<T>` types in SIMD registers as we do with the managed calling convention. We will treat the `Vector2/3/4` types as non-SIMD types (and block their usage directly as parameters in the `CallConvSwift` signature as is the case with other structs).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We will pass the `System.Runtime.Intrinsics.VectorX<T>` types in SIMD registers as we do with the managed calling convention. We will treat the `Vector2/3/4` types as non-SIMD types (and block their usage directly as parameters in the `CallConvSwift` signature as is the case with other structs).
We will pass the `System.Runtime.Intrinsics.VectorX<T>` types in SIMD registers. We will treat the `Vector2/3/4` types as non-SIMD types (and block their usage directly as parameters in the `CallConvSwift` signature as is the case with other structs).

I don't think the managed convention does this on all platforms (today).

Do we allow interop with these types in other interop scenarios? It sounds like it is going to add dotnet/runtime#8300 + dotnet/runtime#9578 as part of the work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have support on ARM64 due to HFA/HVA support. If I'm wrong, then yes this would add in those two issues as part of this work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, on ARM64 we support it, but not on x64.

Is the SIMD interop important enough to warrant implementing it for x64? Those two issues on their own are large work items.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like macOS x64 is still going to be widely supported when .NET 9 releases, so it depends on if the libraries we want to support are high enough priority. For example, the Accelerate framework has many APIs that take the SIMD types.

@kotlarmilos what are the Apple libraries that we're targeting for .NET 9?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would Vector2/3/4 be blocked? They have always been supported for interop and have been treated the equivalent of user-defined structs containing 2, 3, or 4 float fields (which is exactly how they are defined).

Vector64/128/256/512<T> and Vector<T> are all blocked from interop. The former set because Windows doesn't correctly handle SIMD returns today (this isn't vectorcall, but rather missing handling for the default x64 calling convention) and the latter because it doesn't make sense from an interop perspective today.


Yes, on ARM64 we support it, but not on x64.

This should exist for Unix already as well and only be missing for Windows x64, since that doesn't pass vectors differently (only returns them differently). __vectorcall would be required for Windows x64 HFA/HVA support and is still desirable long term so that we better optimize such perf critical functions; it just hasn't bubbled up in priority yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest concern here is that ABI is an extremely complex space and interop is one of those spaces where users want both simplicity and reduced overhead, especially when generating larger binding libraries.

Apple has also notably broken ABI in the past or deviated conventions from the norm on new platforms and so it is entirely possible some new platform comes on and now every single bit of ObjC/Swift interop code is DoA.

I think it is ultimately much better (even if its not what is done for the initial release due to timing constraints or w/e) that we have this support in the runtime as a detail of the CallConv support and that users are ultimately able to write a delegate* unmanaged[CallConvSwift]<T, U, V> that mirrors the underlying ObjC/Swift signature that would be exposed to C/C++ using the official Swift tooling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not seem hard to make these particular calls aggressively inlined if we think that's beneficial.

These calls are typically going to have try/catch block in them to convert the .NET exception into switft error. You would have to implement inlining of methods with exception handling to make this work...

Copy link
Member

@tannergooding tannergooding Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is to say, a user should be able to export Swift bindings to C using official Apple tooling and then use another existing tool, such as ClangSharp, CppAst, etc; which can generate blittable P/Invoke bindings from a C header and expect it to work.

If we can't achieve that, I expect we will have a lot of downstream pain/headaches from the community, especially as it gets into more complex bindings and libraries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These calls are typically going to have try/catch block in them to convert the .NET exception into switft error. You would have to implement inlining of methods with exception handling to make this work...

Don't tempt me :-) (Note that this is actually part of our .NET 9 plan, and I also think it would be much more likely we end up with this support than appetite for improving the UCO Swift case in the future.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re handling tuples: I think we can still handle tuples at the projection layer since the splitting of a tuple into separate arguments is done at the SIL layer and is very straightforward (it doesn't have nearly the same complexity as the "primitive type sequence" lowering) especially if JIT implementation cost for tuples would be too much.

Additionally, the primitive type sequence lowering happens after the tuple lowering (so each tuple element can be lowered to a sequence of up to 4 primitives), so handling tuples in the projection layer doesn't interfere with the primitive sequence handling.

@@ -118,7 +118,13 @@ When calling a function that returns an opaque struct, the Swift ABI always requ

At the lowest level of the calling convention, we do not consider Library Evolution to be a different calling convention than the Swift calling convention. Library Evolution requires that some types are passed by a pointer/reference, but it does not fundamentally change the calling convention. Effectively, Library Evolution forces the least optimizable choice to be taken at every possible point. As a result, we should not handle Library Evolution as a separate calling convention and instead we can manually handle it at the projection layer.

For frozen structs and enums, Swift has a complicated lowering process where the struct or enum type's layout are recursively flattened to a sequence of primitives. If this sequence is length 4 or less, the values of this type are split into the elements of this sequence for parameter passing instead of passing the struct as a whole. Structs and enums that cannot be broken down in this way are passed by-reference to their specified frozen layout. Due to high implementation cost in the RyuJIT, in particular in the `UnmanagedCallersOnly` scenario, we should implement this first pass of lowering in the projection layer; the only types allowed for `CallConvSwift` calling convention in method or function pointer signatures are primitives, our special Swift register types, and pointer types. For reference, this lowering pass is done in the Swift compiler when lowering from Swift IL to LLVM IR. This design decision reinforces our direction of having the Runtime layer of Swift interop support similar features as the LLVM IR representation of Swift.
For frozen structs and enums, Swift has a complicated lowering process where the struct or enum type's layout are recursively flattened to a sequence of primitives. If this sequence is length 4 or less, the values of this type are split into the elements of this sequence for parameter passing instead of passing the struct as a whole. Structs and enums that cannot be broken down in this way are passed by-reference to their specified frozen layout. When a frozen struct or enum with a primitive sequence of 4 elements or less is returned from a function, it is returned as if it were a structure of the elements of the primitive sequence. Due to high implementation cost in the RyuJIT, in particular in the `UnmanagedCallersOnly` scenario, we should implement this first pass of lowering in the projection layer. The only types allowed for `CallConvSwift` calling convention in method or function pointer parameters are primitives, our special Swift register types, and pointer types. In return types, we will also allow structure types to support returning the primitive type sequences correctly. For reference, this lowering pass is done in the Swift compiler when lowering from Swift IL to LLVM IR. This design decision reinforces our direction of having the Runtime layer of Swift interop support similar features as the LLVM IR representation of Swift.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds overall similar to the struct splitting and HFA support done for Unix System V.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is similar to SysV. In the SysV classifier, there is just one struct layout and the classifier splits it out over a set of registers or passes the entire thing on stack. It's not hard to represent as a first-class ABI constraint in the JIT.

I think this is more like ARM32. In ARM32, you can have single arguments that are passed partly in registers and partly on stack. The JIT has special support for it (FEATURE_ARG_SPLIT) and a special node PUTARG_SPLIT that is only used on ARM32 (and win-arm64 which has this peculiar case in an edge case around varargs too).

Swift takes this to the next level. In Swift, you can have a single argument that interacts arbitrarily with the underlying calling convention. For example, a single struct argument can simultaneously have parts of it that are passed as implicit byrefs (copied to caller stack and a pointer passed), other parts of it that get passed in registers, other parts that are on stack, other parts that are HFAs.... There really is no limit to how many ABI constraints result from a single argument. That's what makes this harder to represent in a first-class way than any other ABI (especially in the UCO case).

proposed/swift-interop.md Outdated Show resolved Hide resolved
@jkoritzinsky jkoritzinsky merged commit 1b507bd into dotnet:main Jan 29, 2024
2 checks passed
@jkoritzinsky jkoritzinsky deleted the swift-simd branch January 29, 2024 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants