Skip to content

Unsafe multi-queue design RFC #8844

@inner-daemons

Description

@inner-daemons

This is a sort of "request for comments" on a new design for multi-queue. It has been determined that enforcing the safety requirements of vulkan's queue ownership transfers is not something that will happen anytime soon. Instead we will be using an unsafe and native-only implementation that places the burden of synchronization and correctness on the user.

All functions below can be considered to be unsafe, except those duplicated from other already-safe functions.

Backend specific notes

Vulkan

Vulkan is interesting because it has queue family transfers unlike other backends. It does allow using VK_SHARING_MODE_CONCURRENT to bypass this, but I think that queue family ownership transfers would provide necesssary information to wgpu for the DX12 backend, so we will ignore this.

DX12

DX12 resource state transitions are the main concern here. We probably want the state of a resource to be defined per-queue. Then a queue family ownership transfer would mean transitioning the resource into/out of a general state.

Metal

Metal is very nice, you can create queues, as many as you want, do whatever you want with them, etc. The only thing worth noting is that prior to metal4, command buffers were created to be attached to a single queue. For that reason, command encoders will be created with a specific queue already specified.

Setup

My opinion is that we should only allow 1 queue per queue-family to be created. This would simplify queue family handling and clear up lots of confusion. I'm not sure that there's much benefit at all to having multiple queues of the same family, and AMD drivers typically don't allow this anyway (except for compute queues). I am open to other ideas.

For the code changes, first a new bitflag type, QueueFlags would have members GRAPHICS, COMPUTE and TRANSFER. In particular, TRANSFER would be mainly for uploads or downloads, not necessarily copying between buffers on the device.

AdapterInfo would have a new field, queue_infos: Vec<QueueFlags>, representing the available queues and their capabilities. The first element of this array will always include all flags.

DeviceDescriptor would have a new field, queues: Vec<u32>, representing the indices of the queues in the adapter's extra_queue_infos that queues should be created from. Each value x must be unique and satisfy x < extra_queue_infos.len(). An empty vec is identical to vec![0], representing a single default queue.

Adapter::request_device would be modified to return (Device, Vec<Queue>), returning all requested queues.

Queue changes

Queues would now have a poll function identical to Device::poll but queue-specific.

They would also have a sync_submit method that takes a list of SynchronizedSubmit descriptors. Each descriptor would have some command buffers, some wait semaphores, and some signal semaphores.

Command encoders

Command encoder descriptors have a new field queue: Option<Queue> that indicates which queue they may be submitted to, defaulting to the first queue.

Queue family ownership transfers

Command encoders would have the following new functions:

  • transfer_texture(&self, t: Texture, receiver: Queue)
  • release_buffer(&self, b: Buffer, receiver: Queue)
  • transfer_texture(&self, t: Texture, sender: Queue)
  • receive_buffer(&self, b: Buffer, sender: Queue)

These would change the ownership status of the resources. A resource can only be used on a queue that currently has ownership. Only one queue can have ownership at a time.

Semaphores

Semaphores would be created on the device with Device::create_semaphore(&self). They wouldn't have a descriptor of any kind. These would operate like timeline semaphores. They would start with a value zero and would have the following methods:

  • Semaphore::signal(&self, value: u64) - changes the semaphore's value if the new value is higher
  • Semaphore::poll(&self) -> u64 - returns the semaphore's current value
  • Semaphore::wait(&self, value: u64) - possibly this would also have a timeout option similar to device polls

Command buffers (that are already encoded) would have additional methods wait_for_semaphore(s: Semaphore, value: u64) and signal_semaphore(s: Semaphore, value: u64) which would allow synchronization between queues.

I'd also like to briefly note that external semaphores are also necessary for using external memory properly.

Resources

Buffer and texture descriptors would have a new field initial_queue: Option<Queue> which indicates the queue that would own the resource at creation time. It would default to the first queue.

Mappable buffers wouldn't be queue-transferrable.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions