Skip to content

WIP: Implementing Acceleration Structures #1967

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

AntarticCoder
Copy link
Contributor

@AntarticCoder AntarticCoder commented Jul 6, 2023

This PR provides an implementation of the VK_KHR_acceleration_structure extension which provides a gateway to ray queries and ray tracing pipelines. This PR is still very WIP due to not being anywhere close to done. The reason for opening this PR so early on is to allow for more concrete discussion of the implementation of acceleration structures, and also keeps people up to date on the implementation.

This PR is related to:
#427
#1953 - Not directly related but may have some slight discussion on Acceleration Structures
#1956

@AntarticCoder
Copy link
Contributor Author

Acceleration Structure and Raytracing in general does not seem to be supported before MacOS 11, so Xcode 11.7 will always fail.

@cdavis5e
Copy link
Collaborator

cdavis5e commented Jul 6, 2023

Acceleration Structure and Raytracing in general does not seem to be supported before MacOS 11, so Xcode 11.7 will always fail.

Then the parts of MoltenVK that deal with Acceleration Structures need to be inside MVK_XCODE_12 blocks.

@billhollings
Copy link
Contributor

Acceleration Structure and Raytracing in general does not seem to be supported before MacOS 11, so Xcode 11.7 will always fail.

Then the parts of MoltenVK that deal with Acceleration Structures need to be inside MVK_XCODE_12 blocks.

Agreed. But before forcing that, we should discuss whether that makes sense at this point. Xcode 11 is now 4 years old, and at some point, we must give up support for it, from a practicality perspective (like this one). Retaining support for Xcode 11 was added a couple of years ago because some devs required it for their internal processes.

A few months ago, I reached out to the community about this exact question, and received no responses. Unless we can determine a good reason for maintaining Xcode 11, maybe now is the time to drop support for it.

@AntarticCoder
Copy link
Contributor Author

AntarticCoder commented Jul 7, 2023

@billhollings MacOS 11 seems to have support from devices as old as 2013 and newer. So it's a matter of dropping support of these pre-2013 devices, as well as that, some people stay on MacOS 10 for support of 32 bit applications and other reasons. This is just something to take into consideration.

@billhollings
Copy link
Contributor

A few months ago, I reached out to the community about this exact question, and received no responses. Unless we can determine a good reason for maintaining Xcode 11, maybe now is the time to drop support for it.

I have added a ping post to that feedback request thread.

@AntarticCoder Hold off wrapping your code in any MVK_XCODE_12 guards while this PR remains a WIP. When this PR is ready to go, based on any feedback we receive to my query ping, we can decide whether we need to actually implement those guard wraps, or abandon Xcode 11.

@AntarticCoder
Copy link
Contributor Author

@billhollings Alright, I'll hold off on the MVK_XCODE_12 guards. Thanks

@billhollings
Copy link
Contributor

@billhollings MacOS 11 seems to have support from devices as old as 2013 and newer. So it's a matter of dropping support of these pre-2013 devices, as well as that, some people stay on MacOS 10 for support of 32 bit applications and other reasons. This is just something to take into consideration.

The MVK_XCODE_12 guard is strictly for API compilation during MoltenVK builds (ie- will it build with the Metal API supported by Xcode 11). Support for older OS runtimes is handled independently, through things like respondsToSelector:.

@AntarticCoder
Copy link
Contributor Author

AntarticCoder commented Jul 7, 2023

@billhollings Ah, yes my mistake. Also, just a thought but only about 120 people actually watch this repository, so I'm not sure how many people will see your message.

@AntarticCoder AntarticCoder force-pushed the khr-acceleration-structures branch 2 times, most recently from 5e5c4a7 to a1b0961 Compare July 7, 2023 16:55
@AntarticCoder
Copy link
Contributor Author

AntarticCoder commented Jul 10, 2023

@billhollings @cdavis5e An issue I've run into during this PR, is accessing the provided scratch buffer, via the provided device address. To solve this, I got a reply from @K0bin in issue #1956, which is as followed.

@AntarticCoder @rcaridade145 The contents function will just give you a CPU pointer to the data of a shared buffer. That's not useful here unless you want to copy all the data around on the CPU every time. (which would also involve a GPU sync)

What you have to do is basically maintain a map that maps BDA VAs to their original buffer objects. Keep in mind that this VA map has to be extremely fast and should minimize locking as much as possible. An example for that can be found in vkd3d-Proton: https://github.com/HansKristian-Work/vkd3d-proton/blob/master/libs/vkd3d/va_map.c

Basically create a map from scratch that is fast, and thread safe, and when you call vkGetBufferDeviceAddress, we could push the address along with buffer. I just wanted to ask if this is a good idea, and what you would change about it.

@K0bin
Copy link

K0bin commented Jul 10, 2023

and when you call vkGetBufferDeviceAddress, we could push the address along with buffer

It's probably better to do that at buffer creation time and keep vkGetBufferDeviceAddress fast.

@AntarticCoder
Copy link
Contributor Author

@K0bin But not every created buffer will be used via the device address. So if you pushed it on vkGetBufferDeviceAddress, you would effectivly be keeping uneeded buffers out of the map.

@K0bin
Copy link

K0bin commented Jul 10, 2023

Base it off of VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT.

@AntarticCoder
Copy link
Contributor Author

Okay then, I'll get started on the implementation. Thanks @K0bin

@billhollings
Copy link
Contributor

billhollings commented Jul 10, 2023

Base it off of VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT.

Search for VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT in existing MoltenVK code. There is already an MVKSmallVector containing a list of these in MVKDevice::_gpuAddressableBuffers. Perhaps this could be modified to use a std::unordered_map, to be used to serve both purposes?

@AntarticCoder
Copy link
Contributor Author

@billhollings That seems like a good idea, I'll go ahead and use that for now, and we can change it in the future if it's not getting the job done.

@AntarticCoder AntarticCoder force-pushed the khr-acceleration-structures branch from 542c3f8 to 9aae084 Compare July 11, 2023 14:31
@AntarticCoder
Copy link
Contributor Author

6/12 commands have been implemented, so I'm halfway there. 🎉

@AntarticCoder AntarticCoder force-pushed the khr-acceleration-structures branch from 9aae084 to 555bf47 Compare July 11, 2023 14:34

if(mvkBufIt != _gpuBufferAddressMap.end())
std::unordered_map<std::pair<uint64_t, uint64_t>, MVKBuffer*>::iterator it;
Copy link

@K0bin K0bin Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not particularly pretty (a named struct instead of std::pair would improve readability) and pretty slow but that should work now.

A simple improvement would be to make it a std::unordered_map<uint64_t, std::vector<MVKBufferRange>> and use va >> 14 (14 being somewhat arbitrary, no idea what works best here) as a key.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we treat the BDA + offset as a buffer? Instead of just storing the ranges?

I mean if it is a "view" of a buffer can't we add it to the unordered_map?

Copy link
Contributor Author

@AntarticCoder AntarticCoder Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: I read this as "why would we need to allow for offsets".

Here's what my toy renderer does for example:
Allocate a single huge VkBuffer for all vertex data in the scene and get the BDA of that.
Then when I build the BLAS, I calculate the VA for where my geometry actually sits which is the BDA + offset.
That then gets passed to the BLAS build. So if you just look for the base BDA of a buffer, you wouldn't find anything in that case.

For what to do: store buffer ranges and iterate over them to find one that contains the requested address. To speed it up, you could combine this with a map where you mask off some of the lower bits of the VA as a key.

@rcaridade145 This buffer is just a single giant buffer iiuc, which is then suballocated to then be accessed via the offset. I'm not sure how we would go about adding these suballocated buffers as regular buffers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that just to increase the hit rate . If i understood correctly we're always using the base buffer and then always calculating to find the inner buffer.
This reminds me of Java ByteBuffers and the method slice - https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/ByteBuffer.html#slice() .

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcaridade145 I think you misunderstand the problem.

The application passes a VA (virtual address) and Metal needs a buffer object. It's essentially a raw pointer for the GPU that has to be mapped back to an opaque buffer object.

@AntarticCoder AntarticCoder force-pushed the khr-acceleration-structures branch 3 times, most recently from 8effe9a to 6c7aa5d Compare July 14, 2023 14:24
@AntarticCoder
Copy link
Contributor Author

Could anyone give me some feedback on the memory address system for acceleration structure. I'd just like a second pair of eyes on the implementation.

Thanks

Copy link
Collaborator

@cdavis5e cdavis5e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, your implementation so far looks pretty straightforward.

Comment on lines 4267 to 4268
// This can lead to fragmentation over time, so I'll just push all keys after this back
// This, however is also another performance issue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, then, if it wouldn't be better to use an ordered (RB-tree-based) std::map<K, V> instead.

@@ -907,6 +931,9 @@ class MVKDevice : public MVKDispatchableVulkanAPIObject {
MVKSmallVector<MVKSmallVector<MVKQueue*, kMVKQueueCountPerQueueFamily>, kMVKQueueFamilyCount> _queuesByQueueFamilyIndex;
MVKSmallVector<MVKResource*, 256> _resources;
MVKSmallVector<MVKBuffer*, 8> _gpuAddressableBuffers;
std::unordered_map<MVKBufferAddressRange, MVKBuffer*, MVKHash_uint64_t_pair> _gpuBufferAddressMap;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The perfectionist in me wants to use a data structure specifically designed for mapping ranges to values. There's gotta be such a structure out there somewhere...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like a IntervalTree https://en.m.wikipedia.org/wiki/Interval_tree ? Java can use something like RangeMap https://www.baeldung.com/guava-rangemap . In cpp i believe boost has something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boost is probaly to heavy for use in MoltenVK, unless it's standalone. I found an interval tree implementation on github here. This could be something to consider for the future, but I'm not sure if we have to implement it immediately.

@AntarticCoder AntarticCoder force-pushed the khr-acceleration-structures branch from 31f04f6 to 94c62f5 Compare July 15, 2023 15:42
AntarticCoder and others added 25 commits June 19, 2025 09:44
A half done implementation of MVKMap. MVKMap aims to use the same API as std::unordered_map, and I used MVKSmallVector as an example of how to write MVKMap. I hope there aren't any bugs however, I'll probably do some tests off of the repository once I'm done
This commit finished off the build acceleration structure command. This is because in MVKDevice, we are now using a std::unordered_map instead of a custom map implementation.
Added in a function to check for acceleration structure compatibility. Also made sure to properly add the device features for acceleration structures.
This commit fixes getBufferAtAddress, because the address could be offset, however that was not valid before. Something to keep in mind is that the performance  maybe really bad, however I believe that until we start running Ray queries or Raytracing Pipelines, we can keep the find function as it is. Note that we're also using a custom hash function because std::pair does not have a default hash. I'm to worried about collisions because only one buffer can occupy between  these 2 addresses.
This commit just adds the xrOS version to the MVK_EXTENSION and to mvkOSVersionIsAtLeast
This commit just changes build acceleration structure, to allow for the different options/parameters to be take into account. I plan to amend this commit.
This commit adds "imaginary" gpu addresses for acceleration structures. This allows us to easily implement the rest of the copy commands, as well as the get address function that are required for this vulkan extension. The only thing I'm worried about is how to defragment the memory, because when we destroy an acceleration structure, pushing all the keys can be costly. Hopefully we find a better way to store addresses.
Simply added the vulkan function GetAccelerationStructureDeviceAddress, next I'll be adding 2 more copy commands.
This commit adds a command which allows for copying of acceleration structures to another memory address, which contains an acceleration structure. This commit also adds all needed functions into MVKInstance.mm so that they can be used.
This commit cleans up the code, and now accepts copy modes for acceleration structures. However, only 2 copy modes are supported which are clone, and compact.
This commit adds the last copy command, that being copy memory to acceleration structure. This command is almost the same as the other copy commands, and I'm getting close to finishing up this PR.
This commit fixes a glaring issue with copying accelerations structures to memory. Previously, I would copy the acceleration structure to another acceleration structure, however the proper thing to do was to copy a buffer to an acceleration structure. This however has not fixed copy acceleration structure to memory.
This commit quickly fixes copying acceleration structures to memory, however I'm not sure if my implementation is right.
This commit adds the functionality for a bottom and top level acceleration structure, which are not quite finished, but I'm pushing this because I'm unable to stash this.
This commit does not do much, however I'm updating to the next macos update, so I'd like to push so I don't lose everything.
Still not finished, just quickly saving my work on get build sizes
This commit is pretty small and just adds AABBs to be allowed to be pushed to the acceleration structure.
This commit adds the device command 'kCmdWriteAccelerationStructuresPropertiesKHR'.
Fix address map inconsistencies

Add compute pipeline to convert instance buffer

We need an intermediate step to convert between vk acceleration structure instances and metal instances
Populate acceleration structure features & limits
Proper lifecycle for buffer address tracking
Utilize special heap allocation for AS buffers

The method right now relies on using heaps to allocate acceleration structures. However, memory bound to buffers used for storage are not aware they must be heaps if placement heaps are disabled. Thus, we must ignore the memory passed in and allocate special heap memory.

Refactor device acceleration structure storage

We need to maintain a list of all currently active acceleration structures. This stems from a somewhat complicated issue. In shaders, vk uses addresses but metal uses indices tied to an array of acc structs passed in to the command. Since we have no efficient or guaranteed way of deducing what acc structs are used in a TLAS given the instance buffer, the best way is to maintain all instances and have their index into the global list tied to their respective address shared with the client.

Add vertex buffer format flag

Initial impl of acceleration structure building

Fix rebase issues
@TheApplePieGod TheApplePieGod force-pushed the khr-acceleration-structures branch from 043deba to 211597d Compare June 19, 2025 13:58
@TheApplePieGod
Copy link

I was working with @AntarticCoder a bit ago on this PR, and there were some changes that never got merged. I spoke with @JarrettSJohnson on their PR, and decided to rebase the current branch and include the other changes we currently had.

@AntarticCoder
Copy link
Contributor Author

@billhollings How would we do that? Would we close out this PR and then open up a new one from @JarrettSJohnson or something else? Thanks

@JarrettSJohnson
Copy link

As long as I can update @AntarticCoder 's branch, I don't mind just adding to there.

@AntarticCoder
Copy link
Contributor Author

@JarrettSJohnson Great. Would you like me to add you as a collaborator to my fork?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.