WIP: Implementing Acceleration Structures #1967

AntarticCoder · 2023-07-06T15:28:06Z

This PR provides an implementation of the VK_KHR_acceleration_structure extension which provides a gateway to ray queries and ray tracing pipelines. This PR is still very WIP due to not being anywhere close to done. The reason for opening this PR so early on is to allow for more concrete discussion of the implementation of acceleration structures, and also keeps people up to date on the implementation.

This PR is related to:
#427
#1953 - Not directly related but may have some slight discussion on Acceleration Structures
#1956

AntarticCoder · 2023-07-06T22:22:36Z

Acceleration Structure and Raytracing in general does not seem to be supported before MacOS 11, so Xcode 11.7 will always fail.

cdavis5e · 2023-07-06T22:26:33Z

Acceleration Structure and Raytracing in general does not seem to be supported before MacOS 11, so Xcode 11.7 will always fail.

Then the parts of MoltenVK that deal with Acceleration Structures need to be inside MVK_XCODE_12 blocks.

billhollings · 2023-07-07T16:12:55Z

Acceleration Structure and Raytracing in general does not seem to be supported before MacOS 11, so Xcode 11.7 will always fail.

Then the parts of MoltenVK that deal with Acceleration Structures need to be inside MVK_XCODE_12 blocks.

Agreed. But before forcing that, we should discuss whether that makes sense at this point. Xcode 11 is now 4 years old, and at some point, we must give up support for it, from a practicality perspective (like this one). Retaining support for Xcode 11 was added a couple of years ago because some devs required it for their internal processes.

A few months ago, I reached out to the community about this exact question, and received no responses. Unless we can determine a good reason for maintaining Xcode 11, maybe now is the time to drop support for it.

AntarticCoder · 2023-07-07T16:35:58Z

@billhollings MacOS 11 seems to have support from devices as old as 2013 and newer. So it's a matter of dropping support of these pre-2013 devices, as well as that, some people stay on MacOS 10 for support of 32 bit applications and other reasons. This is just something to take into consideration.

billhollings · 2023-07-07T16:39:42Z

A few months ago, I reached out to the community about this exact question, and received no responses. Unless we can determine a good reason for maintaining Xcode 11, maybe now is the time to drop support for it.

I have added a ping post to that feedback request thread.

@AntarticCoder Hold off wrapping your code in any MVK_XCODE_12 guards while this PR remains a WIP. When this PR is ready to go, based on any feedback we receive to my query ping, we can decide whether we need to actually implement those guard wraps, or abandon Xcode 11.

AntarticCoder · 2023-07-07T16:41:19Z

@billhollings Alright, I'll hold off on the MVK_XCODE_12 guards. Thanks

billhollings · 2023-07-07T16:44:26Z

@billhollings MacOS 11 seems to have support from devices as old as 2013 and newer. So it's a matter of dropping support of these pre-2013 devices, as well as that, some people stay on MacOS 10 for support of 32 bit applications and other reasons. This is just something to take into consideration.

The MVK_XCODE_12 guard is strictly for API compilation during MoltenVK builds (ie- will it build with the Metal API supported by Xcode 11). Support for older OS runtimes is handled independently, through things like respondsToSelector:.

AntarticCoder · 2023-07-07T16:48:52Z

@billhollings Ah, yes my mistake. Also, just a thought but only about 120 people actually watch this repository, so I'm not sure how many people will see your message.

AntarticCoder · 2023-07-10T13:09:41Z

@billhollings @cdavis5e An issue I've run into during this PR, is accessing the provided scratch buffer, via the provided device address. To solve this, I got a reply from @K0bin in issue #1956, which is as followed.

@AntarticCoder @rcaridade145 The contents function will just give you a CPU pointer to the data of a shared buffer. That's not useful here unless you want to copy all the data around on the CPU every time. (which would also involve a GPU sync)

What you have to do is basically maintain a map that maps BDA VAs to their original buffer objects. Keep in mind that this VA map has to be extremely fast and should minimize locking as much as possible. An example for that can be found in vkd3d-Proton: https://github.com/HansKristian-Work/vkd3d-proton/blob/master/libs/vkd3d/va_map.c

Basically create a map from scratch that is fast, and thread safe, and when you call vkGetBufferDeviceAddress, we could push the address along with buffer. I just wanted to ask if this is a good idea, and what you would change about it.

K0bin · 2023-07-10T15:02:13Z

and when you call vkGetBufferDeviceAddress, we could push the address along with buffer

It's probably better to do that at buffer creation time and keep vkGetBufferDeviceAddress fast.

AntarticCoder · 2023-07-10T15:04:31Z

@K0bin But not every created buffer will be used via the device address. So if you pushed it on vkGetBufferDeviceAddress, you would effectivly be keeping uneeded buffers out of the map.

K0bin · 2023-07-10T15:32:57Z

Base it off of VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT.

AntarticCoder · 2023-07-10T15:44:06Z

Okay then, I'll get started on the implementation. Thanks @K0bin

billhollings · 2023-07-10T21:30:25Z

Base it off of VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT.

Search for VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT in existing MoltenVK code. There is already an MVKSmallVector containing a list of these in MVKDevice::_gpuAddressableBuffers. Perhaps this could be modified to use a std::unordered_map, to be used to serve both purposes?

AntarticCoder · 2023-07-10T22:22:05Z

@billhollings That seems like a good idea, I'll go ahead and use that for now, and we can change it in the future if it's not getting the job done.

AntarticCoder · 2023-07-11T14:32:46Z

6/12 commands have been implemented, so I'm halfway there. 🎉

MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm

K0bin · 2023-07-12T20:23:27Z

MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm


-    if(mvkBufIt != _gpuBufferAddressMap.end())
+    std::unordered_map<std::pair<uint64_t, uint64_t>, MVKBuffer*>::iterator it;


Not particularly pretty (a named struct instead of std::pair would improve readability) and pretty slow but that should work now.

A simple improvement would be to make it a std::unordered_map<uint64_t, std::vector<MVKBufferRange>> and use va >> 14 (14 being somewhat arbitrary, no idea what works best here) as a key.

Can't we treat the BDA + offset as a buffer? Instead of just storing the ranges?

I mean if it is a "view" of a buffer can't we add it to the unordered_map?

EDIT: I read this as "why would we need to allow for offsets".

Here's what my toy renderer does for example:
Allocate a single huge VkBuffer for all vertex data in the scene and get the BDA of that.
Then when I build the BLAS, I calculate the VA for where my geometry actually sits which is the BDA + offset.
That then gets passed to the BLAS build. So if you just look for the base BDA of a buffer, you wouldn't find anything in that case.

For what to do: store buffer ranges and iterate over them to find one that contains the requested address. To speed it up, you could combine this with a map where you mask off some of the lower bits of the VA as a key.

@rcaridade145 This buffer is just a single giant buffer iiuc, which is then suballocated to then be accessed via the offset. I'm not sure how we would go about adding these suballocated buffers as regular buffers.

I meant that just to increase the hit rate . If i understood correctly we're always using the base buffer and then always calculating to find the inner buffer.
This reminds me of Java ByteBuffers and the method slice - https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/ByteBuffer.html#slice() .

@rcaridade145 I think you misunderstand the problem.

The application passes a VA (virtual address) and Metal needs a buffer object. It's essentially a raw pointer for the GPU that has to be mapped back to an opaque buffer object.

AntarticCoder · 2023-07-14T18:12:16Z

Could anyone give me some feedback on the memory address system for acceleration structure. I'd just like a second pair of eyes on the implementation.

Thanks

cdavis5e

Well, your implementation so far looks pretty straightforward.

cdavis5e · 2023-07-14T23:48:41Z

MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm

+    // This can lead to fragmentation over time, so I'll just push all keys after this back
+    // This, however is also another performance issue


I wonder, then, if it wouldn't be better to use an ordered (RB-tree-based) std::map<K, V> instead.

cdavis5e · 2023-07-14T23:51:17Z

MoltenVK/MoltenVK/GPUObjects/MVKDevice.h

@@ -907,6 +931,9 @@ class MVKDevice : public MVKDispatchableVulkanAPIObject {
 	MVKSmallVector<MVKSmallVector<MVKQueue*, kMVKQueueCountPerQueueFamily>, kMVKQueueFamilyCount> _queuesByQueueFamilyIndex;
 	MVKSmallVector<MVKResource*, 256> _resources;
 	MVKSmallVector<MVKBuffer*, 8> _gpuAddressableBuffers;
+    std::unordered_map<MVKBufferAddressRange, MVKBuffer*, MVKHash_uint64_t_pair> _gpuBufferAddressMap;


The perfectionist in me wants to use a data structure specifically designed for mapping ranges to values. There's gotta be such a structure out there somewhere...

Something like a IntervalTree https://en.m.wikipedia.org/wiki/Interval_tree ? Java can use something like RangeMap https://www.baeldung.com/guava-rangemap . In cpp i believe boost has something.

Boost is probaly to heavy for use in MoltenVK, unless it's standalone. I found an interval tree implementation on github here. This could be something to consider for the future, but I'm not sure if we have to implement it immediately.

A half done implementation of MVKMap. MVKMap aims to use the same API as std::unordered_map, and I used MVKSmallVector as an example of how to write MVKMap. I hope there aren't any bugs however, I'll probably do some tests off of the repository once I'm done

This commit finished off the build acceleration structure command. This is because in MVKDevice, we are now using a std::unordered_map instead of a custom map implementation.

Added in a function to check for acceleration structure compatibility. Also made sure to properly add the device features for acceleration structures.

This commit fixes getBufferAtAddress, because the address could be offset, however that was not valid before. Something to keep in mind is that the performance maybe really bad, however I believe that until we start running Ray queries or Raytracing Pipelines, we can keep the find function as it is. Note that we're also using a custom hash function because std::pair does not have a default hash. I'm to worried about collisions because only one buffer can occupy between these 2 addresses.

This commit just adds the xrOS version to the MVK_EXTENSION and to mvkOSVersionIsAtLeast

This commit just changes build acceleration structure, to allow for the different options/parameters to be take into account. I plan to amend this commit.

This commit adds "imaginary" gpu addresses for acceleration structures. This allows us to easily implement the rest of the copy commands, as well as the get address function that are required for this vulkan extension. The only thing I'm worried about is how to defragment the memory, because when we destroy an acceleration structure, pushing all the keys can be costly. Hopefully we find a better way to store addresses.

Simply added the vulkan function GetAccelerationStructureDeviceAddress, next I'll be adding 2 more copy commands.

This commit adds a command which allows for copying of acceleration structures to another memory address, which contains an acceleration structure. This commit also adds all needed functions into MVKInstance.mm so that they can be used.

This commit cleans up the code, and now accepts copy modes for acceleration structures. However, only 2 copy modes are supported which are clone, and compact.

This commit adds the last copy command, that being copy memory to acceleration structure. This command is almost the same as the other copy commands, and I'm getting close to finishing up this PR.

This commit fixes a glaring issue with copying accelerations structures to memory. Previously, I would copy the acceleration structure to another acceleration structure, however the proper thing to do was to copy a buffer to an acceleration structure. This however has not fixed copy acceleration structure to memory.

This commit quickly fixes copying acceleration structures to memory, however I'm not sure if my implementation is right.

This commit adds the functionality for a bottom and top level acceleration structure, which are not quite finished, but I'm pushing this because I'm unable to stash this.

This commit does not do much, however I'm updating to the next macos update, so I'd like to push so I don't lose everything.

Still not finished, just quickly saving my work on get build sizes

This commit is pretty small and just adds AABBs to be allowed to be pushed to the acceleration structure.

This commit adds the device command 'kCmdWriteAccelerationStructuresPropertiesKHR'.

Fix address map inconsistencies Add compute pipeline to convert instance buffer We need an intermediate step to convert between vk acceleration structure instances and metal instances Populate acceleration structure features & limits Proper lifecycle for buffer address tracking Utilize special heap allocation for AS buffers The method right now relies on using heaps to allocate acceleration structures. However, memory bound to buffers used for storage are not aware they must be heaps if placement heaps are disabled. Thus, we must ignore the memory passed in and allocate special heap memory. Refactor device acceleration structure storage We need to maintain a list of all currently active acceleration structures. This stems from a somewhat complicated issue. In shaders, vk uses addresses but metal uses indices tied to an array of acc structs passed in to the command. Since we have no efficient or guaranteed way of deducing what acc structs are used in a TLAS given the instance buffer, the best way is to maintain all instances and have their index into the global list tied to their respective address shared with the client. Add vertex buffer format flag Initial impl of acceleration structure building Fix rebase issues

TheApplePieGod · 2025-06-19T14:02:14Z

I was working with @AntarticCoder a bit ago on this PR, and there were some changes that never got merged. I spoke with @JarrettSJohnson on their PR, and decided to rebase the current branch and include the other changes we currently had.

AntarticCoder · 2025-06-19T14:19:49Z

@billhollings How would we do that? Would we close out this PR and then open up a new one from @JarrettSJohnson or something else? Thanks

JarrettSJohnson · 2025-06-20T15:01:14Z

As long as I can update @AntarticCoder 's branch, I don't mind just adding to there.

AntarticCoder · 2025-06-20T21:52:22Z

@JarrettSJohnson Great. Would you like me to add you as a collaborator to my fork?

AntarticCoder mentioned this pull request Jul 6, 2023

Raytracing support #427

Open

AntarticCoder force-pushed the khr-acceleration-structures branch 2 times, most recently from 5e5c4a7 to a1b0961 Compare July 7, 2023 16:55

billhollings mentioned this pull request Jul 10, 2023

MoltenVK Enhancement Roadmap #1975

Open

AntarticCoder force-pushed the khr-acceleration-structures branch from 542c3f8 to 9aae084 Compare July 11, 2023 14:31

AntarticCoder force-pushed the khr-acceleration-structures branch from 9aae084 to 555bf47 Compare July 11, 2023 14:34

K0bin reviewed Jul 11, 2023

View reviewed changes

MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm Show resolved Hide resolved

K0bin reviewed Jul 12, 2023

View reviewed changes

AntarticCoder force-pushed the khr-acceleration-structures branch 3 times, most recently from 8effe9a to 6c7aa5d Compare July 14, 2023 14:24

cdavis5e reviewed Jul 14, 2023

View reviewed changes

AntarticCoder force-pushed the khr-acceleration-structures branch from 31f04f6 to 94c62f5 Compare July 15, 2023 15:42

AntarticCoder and others added 25 commits June 19, 2025 09:44

Using std::unordered_map to store the Buffer Device Addresses

0832c3b

This commit finished off the build acceleration structure command. This is because in MVKDevice, we are now using a std::unordered_map instead of a custom map implementation.

Implemented Acceleration Structure Compatibility

b58c007

Added in a function to check for acceleration structure compatibility. Also made sure to properly add the device features for acceleration structures.

Update to xrOS versioning system from main

e141125

This commit just adds the xrOS version to the MVK_EXTENSION and to mvkOSVersionIsAtLeast

Take in more parameters for CmdBuildAccelerationStructure

c9a45c3

This commit just changes build acceleration structure, to allow for the different options/parameters to be take into account. I plan to amend this commit.

Added Get Acceleration structure function

29f6775

Simply added the vulkan function GetAccelerationStructureDeviceAddress, next I'll be adding 2 more copy commands.

Copy Acceleration Structure to Memory

cf914f2

This commit adds a command which allows for copying of acceleration structures to another memory address, which contains an acceleration structure. This commit also adds all needed functions into MVKInstance.mm so that they can be used.

Code Cleanup and Copy modes for Acceleration Structures

0beca2e

This commit cleans up the code, and now accepts copy modes for acceleration structures. However, only 2 copy modes are supported which are clone, and compact.

Copy Memory to Acceleration Structure

6951700

This commit adds the last copy command, that being copy memory to acceleration structure. This command is almost the same as the other copy commands, and I'm getting close to finishing up this PR.

Fixed Copy Acceleration Structure to Memory

ea1830b

This commit quickly fixes copying acceleration structures to memory, however I'm not sure if my implementation is right.

Acceleration structures with Levels

9533c7c

This commit adds the functionality for a bottom and top level acceleration structure, which are not quite finished, but I'm pushing this because I'm unable to stash this.

Added command uses and MTLHeaps

e2fe905

This commit does not do much, however I'm updating to the next macos update, so I'd like to push so I don't lose everything.

Corrected Some Simple Build Errors

8bb2b14

Correcting Build Errors due to Merge

a330fe2

Added Function Definitions for Overwritten Functions

e3692e6

Quick Save

8fd55e2

Buggy Implementation on Get Build Sizes

d861634

Still not finished, just quickly saving my work on get build sizes

Implemented AABB Geometry for Build and Get Sizes

f8a5738

This commit is pretty small and just adds AABBs to be allowed to be pushed to the acceleration structure.

Implement new address map for buffers

d30fb41

Implement Write Acc Structure Properties - WIP

af52c2f

This commit adds the device command 'kCmdWriteAccelerationStructuresPropertiesKHR'.

Refactor acc structure build & get build sizes

e6776e1

TheApplePieGod force-pushed the khr-acceleration-structures branch from 043deba to 211597d Compare June 19, 2025 13:58


		if(mvkBufIt != _gpuBufferAddressMap.end())
		std::unordered_map<std::pair<uint64_t, uint64_t>, MVKBuffer*>::iterator it;

		// This can lead to fragmentation over time, so I'll just push all keys after this back
		// This, however is also another performance issue

WIP: Implementing Acceleration Structures #1967

Are you sure you want to change the base?

WIP: Implementing Acceleration Structures #1967

Conversation

AntarticCoder commented Jul 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AntarticCoder commented Jul 6, 2023

Uh oh!

cdavis5e commented Jul 6, 2023

Uh oh!

billhollings commented Jul 7, 2023

Uh oh!

AntarticCoder commented Jul 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

billhollings commented Jul 7, 2023

Uh oh!

AntarticCoder commented Jul 7, 2023

Uh oh!

billhollings commented Jul 7, 2023

Uh oh!

AntarticCoder commented Jul 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AntarticCoder commented Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

K0bin commented Jul 10, 2023

Uh oh!

AntarticCoder commented Jul 10, 2023

Uh oh!

K0bin commented Jul 10, 2023

Uh oh!

AntarticCoder commented Jul 10, 2023

Uh oh!

billhollings commented Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AntarticCoder commented Jul 10, 2023

Uh oh!

AntarticCoder commented Jul 11, 2023

Uh oh!

Uh oh!

K0bin Jul 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rcaridade145 Jul 12, 2023

Choose a reason for hiding this comment

Uh oh!

AntarticCoder Jul 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rcaridade145 Jul 12, 2023

Choose a reason for hiding this comment

Uh oh!

K0bin Jul 13, 2023

Choose a reason for hiding this comment

Uh oh!

AntarticCoder commented Jul 14, 2023

Uh oh!

cdavis5e left a comment

Choose a reason for hiding this comment

Uh oh!

cdavis5e Jul 14, 2023

Choose a reason for hiding this comment

Uh oh!

cdavis5e Jul 14, 2023

Choose a reason for hiding this comment

Uh oh!

rcaridade145 Jul 15, 2023

Choose a reason for hiding this comment

Uh oh!

AntarticCoder Jul 15, 2023

Choose a reason for hiding this comment

Uh oh!

TheApplePieGod commented Jun 19, 2025

Uh oh!

AntarticCoder commented Jul 6, 2023 •

edited

Loading

AntarticCoder commented Jul 7, 2023 •

edited

Loading

AntarticCoder commented Jul 7, 2023 •

edited

Loading

AntarticCoder commented Jul 10, 2023 •

edited

Loading

billhollings commented Jul 10, 2023 •

edited

Loading

K0bin Jul 12, 2023 •

edited

Loading

AntarticCoder Jul 12, 2023 •

edited

Loading