Instanced rendering not the fastest

According to some comments and https://www.slideshare.net/DevCentralAMD/vertex-shader-tricks-bill-bilodeau, instanced rendering of small meshes is not the fastest and can be outperformed by one or more larger draw calls.

I've tested a simple 4 vertex index buffer alternative in [this branch](/bevy_polyline/tree/indexed-instanced), but it's not any faster and possibly slower. This was already suggested to be slower by cwfitzgerald/DrawCallYeeter, who tested this in another case.

The current method using the vertex-buffer-as-instance-buffer, storing [p0, p1, p2, ..] with an instance stride of 1 point, only works well for instanced rendering. For non-instanced a 4x (indexed) or 6x (non-indexed) duplication of points is required.

One alternative is to store the points in a SSBO and manually index into it using `instance_index = vertex_index / 6`. The indexing into coefficients in the shader can be done using `coefficient_index = vertex_index % 6`.

One downside is SSBOs have a WGPU default limit of 128MB, which means we can store at most 128MB / 12 bytes segments per SSBO (without querying for SSBO size limit). So we'd need to render in chunks for anything larger than that.

We should also test whether doing the above SSBO solution is faster with or without an index buffer. A single index buffer of say 8096 can be reused for every line and chunk, which is something at least.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Instanced rendering not the fastest #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Instanced rendering not the fastest #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions