Skip to content

Releases: gfx-rs/wgpu

v28.0.0 - Mesh Shaders, Immediates, and More!

18 Dec 03:36
v28.0.0
3f02781

Choose a tag to compare

Major Changes

Mesh Shaders

This has been a long time coming. See the tracking issue for more information.
They are now fully supported on Vulkan, and supported on Metal and DX12 with passthrough shaders. WGSL parsing and rewriting
is supported, meaning they can be used through WESL or naga_oil.

Mesh shader pipelines replace the standard vertex shader pipelines and allow new ways to render meshes.
They are ideal for meshlet rendering, a form of rendering where small groups of triangles are handled together,
for both culling and rendering.

They are compute-like shaders, and generate primitives which are passed directly to the rasterizer, rather
than having a list of vertices generated individually and then using a static index buffer. This means that certain computations
on nearby groups of triangles can be done together, the relationship between vertices and primitives is more programmable, and
you can even pass non-interpolated per-primitive data to the fragment shader, independent of vertices.

Mesh shaders are very versatile, and are powerful enough to replace vertex shaders, tesselation shaders, and geometry shaders
on their own or with task shaders.

A full example of mesh shaders in use can be seen in the mesh_shader example. For the full specification of mesh shaders in wgpu, go to docs/api-specs/mesh_shading.md. Below is a small snippet of shader code demonstrating their usage:

@task
@payload(taskPayload)
@workgroup_size(1)
fn ts_main() -> @builtin(mesh_task_size) vec3<u32> {
    // Task shaders can use workgroup variables like compute shaders
    workgroupData = 1.0;
    // Pass some data to all mesh shaders dispatched by this workgroup
    taskPayload.colorMask = vec4(1.0, 1.0, 0.0, 1.0);
    taskPayload.visible = 1;
    // Dispatch a mesh shader grid with one workgroup
    return vec3(1, 1, 1);
}

@mesh(mesh_output)
@payload(taskPayload)
@workgroup_size(1)
fn ms_main(@builtin(local_invocation_index) index: u32, @builtin(global_invocation_id) id: vec3<u32>) {
    // Set how many outputs this workgroup will generate
    mesh_output.vertex_count = 3;
    mesh_output.primitive_count = 1;
    // Can also use workgroup variables
    workgroupData = 2.0;

    // Set vertex outputs
    mesh_output.vertices[0].position = positions[0];
    mesh_output.vertices[0].color = colors[0] * taskPayload.colorMask;

    mesh_output.vertices[1].position = positions[1];
    mesh_output.vertices[1].color = colors[1] * taskPayload.colorMask;

    mesh_output.vertices[2].position = positions[2];
    mesh_output.vertices[2].color = colors[2] * taskPayload.colorMask;
    
    // Set the vertex indices for the only primitive
    mesh_output.primitives[0].indices = vec3<u32>(0, 1, 2);
    // Cull it if the data passed by the task shader says to
    mesh_output.primitives[0].cull = taskPayload.visible == 1;
    // Give a noninterpolated per-primitive vec4 to the fragment shader
    mesh_output.primitives[0].colorMask = vec4<f32>(1.0, 0.0, 1.0, 1.0);
}
Thanks

This was a monumental effort from many different people, but it was championed by @inner-daemons, without whom it would not have happened.
Thank you @cwfitzgerald for doing the bulk of the code review. Finally thank you @ColinTimBarndt for coordinating the testing effort.

Reviewers:

wgpu Contributions:

naga Contributions:

Testing Assistance:

Thank you to everyone to made this happen!

Switch from gpu-alloc to gpu-allocator in the vulkan backend

gpu-allocator is the allocator used in the dx12 backend, allowing to configure
the allocator the same way in those two backends converging their behavior.

This also brings the Device::generate_allocator_report feature to
the vulkan backend.

By @DeltaEvo in #8158.

wgpu::Instance::enumerate_adapters is now async & available on WebGPU

BREAKING CHANGE: enumerate_adapters is now async:

- pub fn enumerate_adapters(&self, backends: Backends) -> Vec<Adapter> {
+ pub fn enumerate_adapters(&self, backends: Backends) -> impl Future<Output = Vec<Adapter>> {

This yields two benefits:

  • This method is now implemented on non-native using the standard Adapter::request_adapter(…), making enumerate_adapters a portable surface. This was previously a nontrivial pain point when an application wanted to do some of its own filtering of adapters.
  • This method can now be implemented in custom backends.

By @R-Cramer4 in #8230

New LoadOp::DontCare

In the case where a renderpass unconditionally writes to all pixels in the rendertarget,
Load can cause unnecessary memory traffic, and Clear can spend time unnecessarily
clearing the rendertargets. DontCare is a new LoadOp which will leave the contents
of the rendertarget undefined. Because this could lead to undefined behavior, this API
requires that the user gives an unsafe token to use the api.

While you can use this unconditionally, on platforms where DontCare is not available,
it will internally use a different load op.

load: LoadOp::DontCare(unsafe { wgpu::LoadOpDontCare::enabled() })

By @cwfitzgerald in #8549

MipmapFilterMode is split from FilterMode

This is a breaking change that aligns wgpu with spec.

SamplerDescriptor {
...
-     mipmap_filter: FilterMode::Nearest
+     mipmap_filter: MipmapFilterMode::Nearest
...
}

By @sagudev in #8314.

Multiview on all major platforms and support for multiview bitmasks

Multiview is a feature that allows rendering the same content to multiple layers of a texture.
This is useful primarily in VR where you wish to display almost identical content to 2 views,
just with a different perspective. Instead of using 2 draw calls or 2 instances for each object, you
can use this feature.

Multiview is also called view instancing in DX12 or vertex amplification in Metal.

Multiview has been reworked, adding support for Metal and DX12, and adding testing and validation to wgpu itself.
This change also introduces a view bitmask, a new field in RenderPassDescriptor that allows a render pass to render
to multiple non-adjacent layers when using the SELECTIVE_MULTIVIEW feature. If you don't use multi-view,
you can set this field to none.

- wgpu::RenderPassDescriptor {
-     label: None,
-     color_attachments: &color_attachments,
-     depth_stencil_attachment: None,
-     timestamp_writes: None,
-     occlusion_query_set: None,
- }
+ wgpu::RenderPassDescriptor {
+     label: None,
+     color_attachments: &color_attachments,
+     depth_stencil_attachment: None,
+     timestamp_writes: None,
+     occlusion_query_set: None,
+     multiview_mask: NonZero::new(3),
+ }

One other breaking change worth noting is that in WGSL @builtin(view_index) now requires a type of u32, where previously it required i32.

By @inner-daemons in #8206.

Error scopes now use guards and are thread-local.

- device.push_error_scope(wgpu::ErrorFilter::Validation);
+ let scope = device.push_error_scope(wgpu::ErrorFilter::Validation);
  // ... perform operations on the device ...
- let error: Option<Error> = device.pop_error_scope().await;
+ let error: Option<Error> = scope.pop().await;

Device error scopes now operate on a per-thread basis. This allows them to be used easily within multithreaded contexts,
without having the error scope capture errors from other threads.

When the std feature is not enabled, we have no way to differentiate between threads, so error scopes return to be
global operations.

By @cwfitzgerald in #8685

Log Levels

We have received complaints about wgpu being way too log spammy at log levels info/warn/error. We have
adjusted our log policy and changed logging such that info and above should be silent unless some exceptional
event happens. Our new log policy is as follows:

  • Error: if we can’t (for some reason, usually a bug) communicate an error any other way.
  • Warning: similar, but there may be one-shot warnings about almost certainly sub-optimal.
  • Info: do not use
  • Debug: Used for interesting events happening inside wgpu.
  • Trace: Used for all events that might be useful to either wgpu or application...
Read more

v27.0.4

23 Oct 18:20
v27.0.4
af91efa

Choose a tag to compare

This release includes wgpu-hal version 27.0.4. All other crates remain at their previous versions.

Bug Fixes

General

  • Remove fragile dependency constraint on ordered-float that prevented semver-compatible changes above 5.0.0. By @kpreid in #8371.

Vulkan

  • Work around extremely poor frame pacing from AMD and Nvidia cards on Windows in Fifo and FifoRelaxed present modes. This is due to the drivers implicitly using a DXGI (Direct3D) swapchain to implement these modes and it having vastly different timing properties. See #8310 and #8354 for more information. By @cwfitzgerald in #8420.

v26.0.6

23 Oct 18:21
v26.0.6
6f8edda

Choose a tag to compare

This release includes wgpu-hal version 26.0.6. All other crates remain at their previous versions.

Bug Fixes

Vulkan

  • Work around extremely poor frame pacing from AMD and Nvidia cards on Windows in Fifo and FifoRelaxed present modes. This is due to the drivers implicitly using a DXGI (Direct3D) swapchain to implement these modes and it having vastly different timing properties. See #8310 and #8354 for more information. By @cwfitzgerald in #8420.

v27.0.3

22 Oct 13:48

Choose a tag to compare

This release includes naga, wgpu-core and wgpu-hal version 27.0.3. All other crates remain at their previous versions.

Bug Fixes

naga

  • Fix a bug that resulted in the Metal error program scope variable must reside in constant address space in some cases. Backport of #8311 by @teoxoy.

General

  • Remove an assertion that causes problems if CommandEncoder::as_hal_mut is used. By @andyleiserson in #8387.

DX12

  • Align copies b/w textures and buffers via a single intermediate buffer per copy when D3D12_FEATURE_DATA_D3D12_OPTIONS13.UnrestrictedBufferTextureCopyPitchSupported is false. By @ErichDonGubler in #7721, backported in #8374.

v26.0.5

21 Oct 21:43

Choose a tag to compare

This release includes wgpu-hal version 26.0.5. All other crates remain at their previous versions.

Bug Fixes

DX12

  • Align copies b/w textures and buffers via a single intermediate buffer per copy when D3D12_FEATURE_DATA_D3D12_OPTIONS13.UnrestrictedBufferTextureCopyPitchSupported is false. By @ErichDonGubler in #7721, backported in #8375.

v27.0.2

04 Oct 03:44
627bf91

Choose a tag to compare

This release includes wgpu-hal version 27.0.2. All other crates remain at their previous versions.

Bug Fixes

DX12

  • Fix device creation failures for devices that do not support mesh shaders. By @vorporeal in #8297.

v27.0.1

02 Oct 17:09
c76dea0

Choose a tag to compare

This release includes wgpu, wgpu-core, wgpu-hal, and wgpu-types version 27.0.1. All other crates remain at their previous versions.

Bug Fixes

v27.0.0

01 Oct 23:45
482a983

Choose a tag to compare

Major Changes

Deferred command buffer actions: map_buffer_on_submit and on_submitted_work_done

You may schedule buffer mapping and a submission-complete callback to run automatically after you submit, directly from encoders, command buffers, and passes.

// Record some GPU work so the submission isn't empty and touches `buffer`.
encoder.clear_buffer(&buffer, 0, None);

// Defer mapping until this encoder is submitted.
encoder.map_buffer_on_submit(&buffer, wgpu::MapMode::Read, 0..size, |result| { .. });

// Fires after the command buffer's work is finished.
encoder.on_submitted_work_done(|| { .. });

// Automatically calls `map_async` and `on_submitted_work_done` after this submission finishes.
queue.submit([encoder.finish()]);

Available on CommandEncoder, CommandBuffer, RenderPass, and ComputePass.

By @cwfitzgerald in #8125.

Builtin Support for DXGI swapchains on top of of DirectComposition Visuals in DX12

By enabling DirectComposition support, the dx12 backend can now support transparent windows.

This creates a single IDCompositionVisual over the entire window that is used by the mfSurface. If a user wants to manage the composition tree themselves, they should create their own device and composition, and pass the relevant visual down into wgpu via SurfaceTargetUnsafe::CompositionVisual.

let instance = wgpu::Instance::new(&wgpu::InstanceDescriptor {
    backend_options: wgpu::BackendOptions {
        dx12: wgpu::Dx12BackendOptions {
            presentation_system: wgpu::Dx12SwapchainKind::DxgiFromVisual,
            ..
        },
        ..
    },
    ..
});

By @n1ght-hunter in #7550.

EXPERIMENTAL_RAY_TRACING_ACCELERATION_STRUCTURE has been merged into EXPERIMENTAL_RAY_QUERY

We have merged the acceleration structure feature into the RayQuery feature. This is to help work around an AMD driver bug and reduce the feature complexity of ray tracing. In the future when ray tracing pipelines are implemented, if either feature is enabled, acceleration structures will be available.

- Features::EXPERIMENTAL_RAY_TRACING_ACCELERATION_STRUCTURE
+ Features::EXPERIMENTAL_RAY_QUERY

By @Vecvec in #7913.

New EXPERIMENTAL_PRECOMPILED_SHADERS API

We have added Features::EXPERIMENTAL_PRECOMPILED_SHADERS, replacing existing passthrough types with a unified CreateShaderModuleDescriptorPassthrough which allows passing multiple shader codes for different backends. By @SupaMaggie70Incorporated in #7834

Difference for SPIR-V passthrough:

- device.create_shader_module_passthrough(wgpu::ShaderModuleDescriptorPassthrough::SpirV(
-     wgpu::ShaderModuleDescriptorSpirV {
-         label: None,
-         source: spirv_code,
-     },
- ))
+ device.create_shader_module_passthrough(wgpu::ShaderModuleDescriptorPassthrough {
+     entry_point: "main".into(),
+     label: None,
+     spirv: Some(spirv_code),
+     ..Default::default()
})

This allows using precompiled shaders without manually checking which backend's code to pass, for example if you have shaders precompiled for both DXIL and SPIR-V.

Buffer mapping apis no longer have lifetimes

Buffer::get_mapped_range(), Buffer::get_mapped_range_mut(), and Queue::write_buffer_with() now return guard objects without any lifetimes. This
makes it significantly easier to store these types in structs, which is useful for building utilities that build the contents of a buffer over time.

- let buffer_mapping_ref: wgpu::BufferView<'_>           = buffer.get_mapped_range(..);
- let buffer_mapping_mut: wgpu::BufferViewMut<'_>        = buffer.get_mapped_range_mut(..);
- let queue_write_with:   wgpu::QueueWriteBufferView<'_> = queue.write_buffer_with(..);
+ let buffer_mapping_ref: wgpu::BufferView               = buffer.get_mapped_range(..);
+ let buffer_mapping_mut: wgpu::BufferViewMut            = buffer.get_mapped_range_mut(..);
+ let queue_write_with:   wgpu::QueueWriteBufferView     = queue.write_buffer_with(..);

By @sagudev in #8046 and @cwfitzgerald in #8070.

EXPERIMENTAL_* features now require unsafe code to enable

We want to be able to expose potentially experimental features to our users before we have ensured that they are fully sound to use.
As such, we now require any feature that is prefixed with EXPERIMENTAL to have a special unsafe token enabled in the device descriptor
acknowledging that the features may still have bugs in them and to report any they find.

adapter.request_device(&wgpu::DeviceDescriptor {
    features: wgpu::Features::EXPERIMENTAL_MESH_SHADER,
    experimental_features: unsafe { wgpu::ExperimentalFeatures::enabled() }
    ..
})

By @cwfitzgerald in #8163.

Multi-draw indirect is now unconditionally supported when indirect draws are supported

We have removed Features::MULTI_DRAW_INDIRECT as it was unconditionally available on all platforms.
RenderPass::multi_draw_indirect is now available if the device supports downlevel flag DownlevelFlags::INDIRECT_EXECUTION.

The Feature::MULTI_DRAW_INDIRECT_COUNT feature can be used to determine if multi-draw is supported natively on the device. This is helpful to know if you are using spirv-passthrough and gl_DrawID in your shaders.

By @cwfitzgerald in #8162.

wgpu::PollType::Wait has now an optional timeout

We removed wgpu::PollType::WaitForSubmissionIndex and added fields to wgpu::PollType::Wait in order to express timeouts.

Before/after for wgpu::PollType::Wait:

-device.poll(wgpu::PollType::Wait).unwrap();
-device.poll(wgpu::PollType::wait_indefinitely()).unwrap();
+device.poll(wgpu::PollType::Wait {
+      submission_index: None, // Wait for most recent submission
+      timeout: Some(std::time::Duration::from_secs(60)), // Previous behavior, but more likely you want `None` instead.
+  })
+  .unwrap();

Before/after for wgpu::PollType::WaitForSubmissionIndex:

-device.poll(wgpu::PollType::WaitForSubmissionIndex(index_to_wait_on))
+device.poll(wgpu::PollType::Wait {
+      submission_index: Some(index_to_wait_on),
+      timeout: Some(std::time::Duration::from_secs(60)), // Previous behavior, but more likely you want `None` instead.
+  })
+  .unwrap();

⚠️ Previously, both wgpu::PollType::WaitForSubmissionIndex and wgpu::PollType::Wait had a hard-coded timeout of 60 seconds.

To wait indefinitely on the latest submission, you can also use the wait_indefinitely convenience function:

device.poll(wgpu::PollType::wait_indefinitely());

By @Wumpf in #8282, #8285

New Features

General

  • Added mesh shader support to wgpu, with examples. Requires passthrough. By @SupaMaggie70Incorporated in #7345.
  • Added support for external textures based on WebGPU's GPUExternalTexture. These allow shaders to transparently operate on potentially multiplanar source texture data in either RGB or YCbCr formats via WGSL's texture_external type. This is gated behind the Features::EXTERNAL_TEXTURE feature, which is currently only supported on DX12. By @jamienicol in #4386.
  • wgpu::Device::poll can now specify a timeout via wgpu::PollType::Wait. By @Wumpf in #8282 & #8285

naga

Changes

General

  • Command encoding now happens when CommandEncoder::finish is called, not when the individual operations are requested. This does not affect the API, but may affect performance characteristics. By @andyleiserson in #8220.
  • Prevent resources for acceleration structures being created if acceleration structures are not enabled. By @Vecvec in #8036.
  • Validate that each push_debug_group pairs with exactly one pop_debug_group. By @andyleiserson in #8048.
  • set_viewport now requires that the supplied minimum depth value is less than the maximum depth value. By @andyleiserson in #8040.
  • Validation of copy_texture_to_buffer, copy_buffer_to_texture, and copy_texture_to_texture operations more closely follows the WebGPU specification. By @andyleiserson in various PRs.
    • Copies within the same texture must not overlap.
    • Copies of multisampled or depth/stencil formats must span an entire subresource (layer).
    • Copies of depth/stencil formats must be 4B aligned.
    • For texture-buffer copies, bytes_per_row on the buffer side must be 256B-aligned, even if the transfer is a single row.
  • The offset for set_vertex_buffer and set_index_buffer must be 4B aligned. By @andyleiserson in #7929.
  • The offset and size of bindings are validated as fitting within the underlying buffer in more cases. By @andyleiserson in #7911.
  • The function you pass to Device::on_uncaptured_error() must now implement Sync in addition to Send, and be wrapped in Arc ...
Read more

v26.0.4

07 Aug 16:06

Choose a tag to compare

This release includes wgpu-hal version 26.0.4. All other crates remain at their previous versions.

Bug Fixes

Vulkan

Fix STATUS_HEAP_CORRUPTION crash when concurrently calling create_sampler. By @atlv24 in #8056.

v26.0.3

31 Jul 02:57
f01af23

Choose a tag to compare

This release includes wgpu-hal version v26.0.3. All other crates remain at their previous versions.

Bug Fixes