In my master thesis I've replicated UE5's Nanite, which is able to draw an object in multiple LODs, without introducing any holes or seams at LOD transitions. It's a GPU-driven renderer using meshlets and mesh shaders that can render glTF scenes as complicated as bistro. I have a custom baker to preprocess all the meshes offline, allowing me to offload the generation of Nanite's LOD tree and enabling blazing fast load times of under a second for all scenes. In the LOD generation showcase video, one can see the output of the LOD tree generator at its various LOD levels. At runtime, these LODs are automatically selected based on their distance to the camera, and due to the unique data structure proposed by Nanite, allow the LODs to transition mid-model.
Requires a mesh shader capable GPU, which all raytracing capable GPUs are (plus a few more). The master branch has been tested on:
- AMD 680M iGPU on Windows and Linux (RADV), very similar to a SteamDeck
- Nvidia 3070ti mobile on Windows
- have Rust and the Vulkan SDK installed
- Build and run with
cargo run --release
Once started, you can resize or maximize the window as you wish. The UI has the controls spelled out at the very top, most importantly using Tab to switch between UI and game focus. Feel free to play around with the settings!
You can add gltf
/glb
scenes by placing them in /models/models/local/
and restarting, which should list them in the scenes ComboBox.
Some scenes which are known to work well:
- Not a Stanford Bunny by Jocelyn Da Prato, no right to sell, Feel free to share, use, modify
- Bistro by Amazon Lumberyard, CC-BY 4.0, 2017 Amazon Lumberyard
- Crytek Sponza from casual-effects.com: CC BY 3.0, 2010 Frank Meinl, Crytek
- Lantern (included by default) from gltf sample assets: CC0 1.0 Universal, Microsoft, Frank Galligan
- any Quixel model from fab, the individual models often have a glb version available
This Project is written in Rust, which should be fairly readable by C++ Programmers, and the shaders are also written in Rust thanks to the rust-gpu shader compiler, of which I have been made a maintainer. This allows me to easily share datastructures and algorithms between the CPU and GPU, and enables the use of rust tooling such as formatters, linters and tests in the shaders.
To represent indirections from one buffer to other buffers, images or samplers I have built my own bindless library, specifically to be used with rust-gpu. The sharing of code allows me to declare GPU structs with "Descriptors" pointing at other resources, that can easily be uploaded directly from the CPU. Some simple examples are available as integration tests with their shader counterparts here. These indirections allow me to jump from a scene struct to instance and model structs, and from those model structs to vertex, index buffers and material textures.
To select which meshlets at their various LODs to render, I use two compute passes supplied with a reference to the scene struct. I spawn one workgroup of the instance cull CS for each model instance, cull the instance, and use all 32 invocations to write out all meshlet instance groups of up to 32 meshlets each. Due to the sheer amount of meshlets each model contains, this proved to be much more performant than spawning one invocation per instance, as is typically done. A second meshlet select CS is launched indirectly with one workgroup per meshlet group emitted previously, so that each invocation culls one meshlet, and writes all passing meshlet instances into a buffer.
The Renderer uses a simple G-Buffer, as I have not had the time to implement a visibility buffer-based renderer. In the 3D pass, I render out all meshlets from the previously generated meshlet buffer using this mesh and fragment shader to the G-Buffer, which is by far the slowest step. The deferred pass uses a lighting CS with most of the PBR evaluation here, and the background is written in a following sky CS, which only writes to fragment of alpha = 0.0
.
The Nanite data structure is split up into the disk format and the shader format, as the disk format, serialized with rkyv, should be focused on compression with zstd whereas the runtime format should focus on the access patterns of the GPU. A few basic shared structs can be found in disk shader. The preprocessor searches for glTF files, processes them in parallel using rayon and writes them out in my internal disk format. The runtime then decompresses and converts it into the shader format.
The UI is using egui, an ImGui-like UI framework written in rust. I've integrated it into my bindless renderer only after submitting the thesis.
In meshlet-renderer/src/main_loop.rs
around line 38 there is the constant const DEBUGGER: Debuggers
which can be set to a variety of debuggers, like RenderDoc
, Validation
or DebugPrintf
. While GpuAssistedValidation
is also available, it is known to report many false positives for this project.