Skip to content

Instanced rendering not the fastest #4

@mtsr

Description

@mtsr

According to some comments and https://www.slideshare.net/DevCentralAMD/vertex-shader-tricks-bill-bilodeau, instanced rendering of small meshes is not the fastest and can be outperformed by one or more larger draw calls.

I've tested a simple 4 vertex index buffer alternative in this branch, but it's not any faster and possibly slower. This was already suggested to be slower by cwfitzgerald/DrawCallYeeter, who tested this in another case.

The current method using the vertex-buffer-as-instance-buffer, storing [p0, p1, p2, ..] with an instance stride of 1 point, only works well for instanced rendering. For non-instanced a 4x (indexed) or 6x (non-indexed) duplication of points is required.

One alternative is to store the points in a SSBO and manually index into it using instance_index = vertex_index / 6. The indexing into coefficients in the shader can be done using coefficient_index = vertex_index % 6.

One downside is SSBOs have a WGPU default limit of 128MB, which means we can store at most 128MB / 12 bytes segments per SSBO (without querying for SSBO size limit). So we'd need to render in chunks for anything larger than that.

We should also test whether doing the above SSBO solution is faster with or without an index buffer. A single index buffer of say 8096 can be reused for every line and chunk, which is something at least.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions