Skip to content

[cpp] Marshalling Extern Types#11981

Merged
Simn merged 227 commits intoHaxeFoundation:developmentfrom
Aidan63:cpp-value-types
Dec 13, 2025
Merged

[cpp] Marshalling Extern Types#11981
Simn merged 227 commits intoHaxeFoundation:developmentfrom
Aidan63:cpp-value-types

Conversation

@Aidan63
Copy link
Contributor

@Aidan63 Aidan63 commented Feb 6, 2025

Another long one, so make sure you're sitting comfortably.

Corresponding hxcpp PR: HaxeFoundation/hxcpp#1189

The Problem

Working with the current interop types comes with many pitfalls which are not immediately obvious, these include.

  • Externed value types (cpp.Struct / cpp::Struct) do not work as expected when captured in a closure. If you mutate one of these captured value types you are mutating a copy, not the object referenced by the variable name.
  • These struct types only really work with C structs, not C++ structs and classes which might have all sorts of interesting constructors, destructors, assignment operators, etc. Structs are compared using memcmp and copies are made using memcpy, so if you have a C++ class which requires copy constructors, copy assignment, or others to work correctly, things will go wrong in ways which are painful to debug.
  • Leaking objects is very easy as destructors are not consistently called. If you place one of these structs in a class field its destructor will never be called.
  • Value types stored in classes will be initialised to zero, not have any default constructor invoked, so any default values or setup code will not be invoked.
  • It’s very easy to write code which does different things depending on the debug level and optimisations enabled by the compiler. You can easily find yourself modifying a copy of a struct in a temporary variable the compiler created, creating very hard to track down bugs.
  • Common C/C++ patterns, such as pointers to pointers, are hard to extern without lots of boilerplate and glue code wrangling.
  • All extern classes are considered native by the generator, so if you want to extern a custom hx::Object sub class some of the necessary GC code isn’t generated for the generational GC.
  • Many other things!

In short, the current interop handlers mostly work with basic c structs in local variables, but if you want to interop directly with C++ classes, you’re going to have a painful time in anything but the most basic cases.

If you just want to see a quick example of it all in action here's a gist which will compile on Windows and use DXGI to print out all displays connected to the first graphics adapter. DXGI uses pointers to pointers, pointers to structs, pointers to void pointers, and other C++ concepts which have been very difficult to map onto haxe in the past. But hopefully you'll agree that it looks like pretty "normal" haxe.

https://gist.github.com/Aidan63/07364c227335f02fbe50b9c9576f7544

New Metadata

In my mind there are three categories of things you might want to extern, native value types, native pointer types, and "managed" (custom hx::Object sub classes) types. This merge introduces three new bits of metadata to represent these categories and solve the above issues.

cpp.ValueType

Using the @:cpp.ValueType metadata on an extern class will cause it to be treated as a value type, so copies will be created when passing into functions, assigning to variables, etc, etc.

@:semantics(reference)
@:cpp.ValueType({ type : 'foo', namespace : [ 'bar', 'baz' ] })
extern class Foo {
	var number : Int;

	function new() : Void;
}

function main() {
	final obj  = new Foo();
	final copy = obj; // creates a copy
}

I've chosen the metadata to take a struct which currently supports a type field for the underlying native type name and namespace which must be an array of string literals for the namespace the type is in. If type is omitted then the name of the extern class is used, if namespace is omitted then the type is assumed to be in the global namespace.

Using this metadata provides several guarantees old struct types. First it behaves how you'd expect when captured in a closure.

function main() {
	final obj  = new Foo();
	final func = () -> {
		obj.number = 7;
	}

	func();

	trace(obj.number); // 7 will be printed.
}

Destructors are guaranteed to be called. When a value type is captured in a closure, stored in a class field, enums, anons, or any other GC objects it is "promoted" to the GC heap and the holder class its contained within registers a finaliser which will call the destructor.

Operators on the defined native type are always used, no memcmp or memcpy. Copy constructors, copy assignment, and standard equality operators are always used no matter the case.

The same sort of null checks are performed with references to these value types as standard haxe classes so you will get standard null exceptions instead of the program crashing with a memory error.

The nullability of these values types is unfortunately a bit odd... If you have a explicitly nullable TVar value type then its always promoted and can be null. But a null value type doesn't make much sense so I've disallowed value type variable declarations with no expression or with a null constant. Trying to assign a null value to a stack value type will result in a runtime null exception. Since value types in class fields and the likes are always promoted they are null if uninitialised. Ideally value type externs could have the same "not null" handling as core type abstracts, but that doesn't seem possible.

Interop with the existing pointer types is also provided as well implicit conversions to pointers on the underlying types for easier interop.

This value type metadata is also supported on extern enum abstracts. Historically externing enums have been a bit of a pain but it works pretty nicely now.

@:semantics(reference)
@:include('colour.hpp')
@:cpp.ValueType({ type : 'colour' namespace : [ 'foo' ] })
private extern enum abstract Colour(Int) {
    @:native('red')
    var Red;

    @:native('green')
    var Green;

    @:native('blue')
    var Blue;
}

cpp.PointerType

Using the @:cpp.PointerType metadata on an extern class will cause it to be treated as a pointer, this metadata supports the same fields as the above value type one.

Extern classes annotated with the pointer type metadata cannot have constructors as they are designed to be used with the common C/C++ pattern of free functions which allocate and free some sort of opaque pointer, or the pointer to pointer pattern.

E.g. the following native API could be externed and used as the following.

namespace foo {
	struct bar {};

	bar* allocate_bar();
	void free_bar(bar*);

	struct baz {};

	void allocate_baz(baz** pBaz);
	void free_baz(baz* baz);
}
@:semantics(reference)
@:cpp.PointerType({ type : 'bar', namespace : [ 'foo' ] })
extern class Bar {}

@:semantics(reference)
@:cpp.PointerType({ type : 'baz', namespace : [ 'foo' ] })
extern class Baz {}

extern class Api {
	@:native('allocate_bar')
	function allocateBar():Bar;

	@:native('free_bar')
	function freeBar(b:Bar):Void;

	@:native('allocate_baz')
	function allocateBaz(b:haxe.extern.AsVar<Baz>):Void;

	@:native('free_baz')
	function freeBaz(b:Baz):Void;
}

function main() {
	final bar = Api.allocateBar();

	Api.freeBar(bar);

	final baz : Baz = null;

	Api.allocBaz(baz);
	Api.freeBaz(baz);
}

The pointer to pointer pattern which is pretty common is quite difficult to extern without custom C++ glue code, but the new pointer type externs understand this pattern and can be converted to a double pointer of the underlying type as well as a pointer to a void pointer which is also seen in many places.

Internally pointer types and value types are treated almost identically so most of the previous points apply here as well, the main exceptions being that promoted pointers don't have finalisers assigned and that null is always an allowed value.

cpp.ManagedType

When you want to extern a custom hx::Object subclass then this is the metadata to use as it ensures the write barriers are generated for the generational gc. Like the above two metadata it supports the type and namespace fields.

namespace foo {
	struct bar : public ::hx::Object {};
}
@:cpp.ManagedType({ type : 'bar', namespace : [ 'foo' ] })
extern class Bar {
	function new() : Void;
}

In the above sample Bar will be generated as ::hx::ObjectPtr<bar> in most cases.

There is one extra field to the managed type, flags, which is expected to be an array of identifiers and currently there is one flag, StandardNaming. If in C++ you use the hxcpp class declaration macro to create a custom subclass with the same naming scheme as haxe generated classes then this flag is designed for that.

HX_DECLARE_CLASS1(foo, Bar)

namespace foo {
	struct Bar_obj : public ::hx::Object {};
}
@:cpp.ManagedType({ type : 'Bar', namespace : [ 'foo' ], flags : [ StandardNaming ] })
extern class Bar {
	function new() : Void;
}

In the above case Bar will be used instead of the manual ::hx::ObjectPtr wrapping but Bar_obj will be used when constructing the class.

Implementation Details and Rational

Marshalling State

Value and pointer types are represented by the new TCppNativeMarshalType enum which can be in one of three states, Stack, Promoted, or Reference. This is the key to working around optimiser issues, capturing, and some interop features. All TCppNativeMarshalType fields are given the promoted state and TVars can be given any three of the states. Any TLocal to a native marshal type is given the reference state. How TVars are given their state is important, variables allocated by the compiler are given the reference state, only variables typed by the user are given one of the stack (uncaptured) or promoted (captured or nullable) state. This means we dodge the issue with cpp.Struct where you could be operating on a copy due to compiler created variables.

TLocals of the reference state are generated with the new cpp::marshal::Reference<T> type which holds a pointer to a specific type and is responsible for any copying, promotion, checking, and just about everything. For the value type case it's T will be a pointer to the value type, and for pointer types will be a pointer to the pointer.

Semantics

You are required to put the @:semantics(reference) metadata on an extern class when using the value or pointer type metadata, this does feel like a bit of a bodge... I was initially hoping that the value semantic would be what was needed, but tests start to fail when the analyser is enabled with value semantics. Maybe I'm just misinterpreting what these semantics are actually used for. With the reference semantics the tests do pass with the analyser, but from a quick glace that appears to be because many optimisations are disabled on types with that semantic meta...

Compiler Error Numbers

There are several errors you may now get if you try and do things wrong (invalid meta expression) or which are not supported (function closures) instead of vague C++ errors. In these cases I've given them distinct error numbers in the CPPXXXX style, similar to MSVC and C# errors. I plan on documenting these since they're things users might naturally cause as opposed to internal bugs, so I thought it might be nice to give then concrete numbers for easier searching.

image

Scoped Metadata

I can never remember the names of the C++ specific metadata and end up sifting through the dropdown list every time, so I decided to prefix these ones with cpp. to make it easier.

Metadata Style

I wanted to avoid re-using the @:native metadata for the extern classes as its already pretty common to do stuff like @:native("::cpp::Struct<FooStruct>") so by having a type and namespace field I wanted to make it clear it should be just the type, nothing else. Also with this we can prefix :: to the type / namespace combo to avoid any relative namespace resolution issues.

Eager Promotion

Due to the very dynamic nature of hxcpp's function arguments and return types there are many places where value types which could be on the stack have to be promoted to satisfy the dynamic handling. With my callable type PR this should be solvable.

Future Stuff

Closures

Currently trying to use a function closure of a value or pointer type will result in a compilation error, but now that the variable promotion stuff is in place it should be possible to generate closures which capture the object to support this. Again I wouldn't want to do this until the callable change is in since that will greatly simplify things.

Arrays and Vectors

Value types stored in contains such as arrays are in their promoted state, not a contiguous chunk of memory which I originally wanted. Preserving C++ copying / construction semantics with haxe's resizable array looked to be a massive pain so I decided not to.
I do think having haxe.ds.Vectors of value types be contiguous should be possible and open up more easier interop possibilities.

Un-dynamicification

Lots of the cpp std haxe types have a Dynamic context object which is then passed into C++ where its cast to a specific type. With the managed type meta we should be able to "un-dynamic" a lot of the standard library implementation.

Closing

I'm sure there's stuff I've missed but this seems to be much more consistent in behaviour and nicer to use than the existing interop types, I've also added a bunch of tests on the hxcpp side to capture all sorts of edge cases I came across. I will also try and write some formal documentation for all this to encourage this over the existing types.

@Aidan63
Copy link
Contributor Author

Aidan63 commented Nov 20, 2025

I've taken a break from the coroutine mines to update this branch.

  • Array access now works again by implementing the ArrayAccess interface.
  • Templated functions can now be externed with generic functions.
  • The required semantics type has been changed to value and fixed an issue where some generated variables can't be references.

I then got a bit carrier away and added some new types to help with externs.

cpp.marshal.View

This value type extern is basically C#'s Span<T> type, but for hxcpp. It is used to represent a contiguous region of memory, managed or unmanaged. You can use the cpp.marshal.ViewExtensions class to create views from haxe arrays, vectors, bytes, or pointers.

final a = [ 0, 1, 2, 3 ];
final v = a.asView().slice(1, 2);

trace(v[0], v[1]); // prints 1, 2

There are then all sorts of operator for getting slices, copying, reinterpreting, etc. Since these are all stack only value types holding a pointer and a length they're nice and light weight.

cpp.marshal.Marshal

This class provides a bunch of static functions for working with view types. Historically it has been quite annoying to deal with c-strings in haxe / hxcpp, but views and the marshal class makes these sort of operations easy.

final s = "Hello, World";
final v = s.toWideCharView(); // cpp.marshal.View<cpp.Char16>

trace(v.toString());

The marshal class also provides functions for reading and writing any types from views, making it easy to pack objects into bytes.

@:cpp.ValueType
extern class Point {
    var x : Int;
    var y : Int;

    function new(x:Int, y:Int):Void;
}

final a = [ 0, 0 ];
final v = a.asView().reinterpret();
final p = new Point(2, 4);

v.write(p);

trace(a);  // prints [ 2, 4 ];

This all works well and I'm really happy with it. I've been experimenting with removing untyped usage using this stuff and I've got a version of haxe.io.Bytes which is now untyped free. E.g. getDouble was,

return untyped __global__.__hxcpp_memory_get_double(b, pos)

and is now

return this.asView().slice(pos).readFloat64()

I've also got a wip implementation of the sys.io.File api and associated readers and writers which use the new managed type externs and views to completely remove the weak Dynamic typing and untyped usage there.

@Aidan63
Copy link
Contributor Author

Aidan63 commented Dec 13, 2025

Few more updates.

I've changed the length of cpp.marshal.View to be Int64, this now means there's a nice way of working with large buffers. The ToArray, ToVector, and other conversion functions to haxe types which only use an Int for their length will throw if the view is too large.

The view type can safely hold managed and unmanaged pointers assuming it always lives on the stack, since hxcpp does conservative stack scanning and anything scanned on the stack is pinned. This requirement is now enforced by the compiler. I.e. the cpp.marshal.View now has the StackOnly flag which enforces this restriction.

E.g. the following sample

import cpp.marshal.View;

using cpp.marshal.ViewExtensions;

function main() {
	final src  = [ 0, 0, 0, 0 ];
	final view = src.asView();

	final func = () -> {
		trace(view.length);
	}

	func();
}

Gives the following error, since view is captured in a closure which requires it being promoted to the heap.

[ERROR] Main.hx:7: characters 8-12

 7 |  final view = src.asView();
   |        ^^^^
   | CPP0011: Marshalling value type with the StackOnly flag cannot be promoted to the heap

This is not the only situation which requires heap promotion, class fields, enum constructors, nullable, etc, will all give the same error.

The cpp.marshal.Marshal class now has static functions for endian specific read and write functions along side the machine endianness ones.

Finally, I've added view based implementations of haxe.io.FPHelper and haxe.io.Bytes which removes all the untyped in the previous cpp implementations.

I've got more ideas for all this but I think this is good enough for now. I've changed the CI to use my hxcpp branch to show the tests all pass, not sure what order we need to merge this and the hxcpp side in.

@Simn
Copy link
Member

Simn commented Dec 13, 2025

Glad to see you've reached a milestone here! These changes to Bytes.hx look great.

I guess since both PRs depend on each other, we should just merge them at the same time and then hope that everything passes. So let's set the reference here back to HaxeFoundation/hxcpp and I'll go click the buttons.

@tobil4sk
Copy link
Member

tobil4sk commented Dec 13, 2025

One thing that I've found painful in the past is when externs assume the way a type will be used (i.e. only as a pointer, or only as a value), and if I want to change it it requires going through and modifying a bunch of extern declarations.

This has lead me to think that there might be benefit in encouraging extern definitions just be a pure declaration of the type fields. Then when using it, you could have the freedom to decide whether it should be a value or pointer. What are your thoughts?

I also wonder if it might be helpful to split this into smaller PRs with more focused diffs, as the large diff and long history will make it difficult to perform git bisects if any issues occur.

@Aidan63
Copy link
Contributor Author

Aidan63 commented Dec 13, 2025

you could have the freedom to decide whether it should be a value or pointer

Potentially, the main thing I was aiming for with the cpp.PointerType is the common c / c++ pattern to pass around some context object which is always a pointer, and trying to do pointer specific things such as dereferencing is usually asking for trouble (lots of libs try to hide the pointer-ness behind a typedef).

There is support for the old pointer types in here as well, so for some edge case where you do need a pointer to a value type extern you can do that.

For the large diff, I'm not sure there's much I could do in this case, unfortunately these additions needed changes all over the place.

@Simn
Copy link
Member

Simn commented Dec 13, 2025

I agree that large diffs can be a problem for the reasons you stated, but this is honestly not the worst PR I've seen in that regard, so I'm sure it will be fine.

@Simn
Copy link
Member

Simn commented Dec 13, 2025

One question: isn't this supposed to increase the API level? The hxcpp PR has #if (HXCPP_API_LEVEL >= 500) changes, but 500 is already the version we're currently on.

@Aidan63
Copy link
Contributor Author

Aidan63 commented Dec 13, 2025

My assumption has been that the hxcpp api level is related to the haxe version. e.g. HXCPP_API_LEVEL 500 is haxe 5.0.0 and since haxe 5 would be the first these additions are in it should be with that api level.
It does lead to these awkward situations in between haxe releases though, but I'm not sure hxcpp has version checking stuff to deal with that.

None of these changes modify the public signatures of any hxcpp stuff though, just new additions. The level checks are just there to prevent the new types being pulled into old haxe 4 builds in case it causes any issues.

@Simn Simn merged commit 33fb5c0 into HaxeFoundation:development Dec 13, 2025
34 of 39 checks passed
@Aidan63 Aidan63 deleted the cpp-value-types branch December 15, 2025 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants