Skip to content

Conversation

@JMS55
Copy link
Contributor

@JMS55 JMS55 commented Jul 6, 2024

Objective

  • Using bincode to deserialize binary into a MeshletMesh is expensive (~77ms for a 5mb file).

Solution

  • Write a custom deserializer using bytemuck's Pod types and slice casting.
    • Total asset load time has gone from ~102ms to ~12ms.
  • Change some types I never meant to be public to private and other misc cleanup.

Testing

  • Ran the meshlet example and added timing spans to the asset loader.

Changelog

  • Improved MeshletMesh loading speed
  • The MeshletMesh disk format has changed, and MESHLET_MESH_ASSET_VERSION has been bumped
  • MeshletMesh fields are now private
  • Renamed MeshletMeshSaverLoad to MeshletMeshSaverLoader
  • The Meshlet, MeshletBoundingSpheres, and MeshletBoundingSphere types are now private
  • Removed MeshletMeshSaveOrLoadError::SerializationOrDeserialization
  • Added MeshletMeshSaveOrLoadError::WrongFileType

Migration Guide

  • Regenerate your MeshletMesh assets, as the disk format has changed, and MESHLET_MESH_ASSET_VERSION has been bumped
  • MeshletMesh fields are now private
  • MeshletMeshSaverLoad is now named MeshletMeshSaverLoader
  • The Meshlet, MeshletBoundingSpheres, and MeshletBoundingSphere types are now private
  • MeshletMeshSaveOrLoadError::SerializationOrDeserialization has been removed
  • Added MeshletMeshSaveOrLoadError::WrongFileType, match on this variant if you match on MeshletMeshSaveOrLoadError

@JMS55 JMS55 added this to the 0.15 milestone Jul 6, 2024
@JMS55 JMS55 added A-Assets Load files from disk to use for things like images, models, and sounds C-Performance A change motivated by improving speed, memory usage or compile times labels Jul 6, 2024
reader.read_to_end(&mut compressed_asset_data).await?;

// Convert compressed data back to an asset
let reader = &mut FrameDecoder::new(Cursor::new(compressed_asset_data));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does LZ4 compression actually help? You might have to use delta encoding (i.e. instead of storing indices you'd store the wrapping difference from the previous index) if you want it to have much impact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did when I used bincode, I should probably double check again.

Nanite has a whole bunch of tricks they use for compressing disk and in-memory asset data, but they're low priority for me vs runtime performance at the moment, and I don't really have much experience with this field https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9.73mb without compression, 4.97mb with.

let compressed_asset_data_len = async_read_u64(reader).await? as usize;

// Load compressed asset data
let mut compressed_asset_data = Vec::with_capacity(compressed_asset_data_len);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish I could directly wrap Bevy's &mut dyn Reader in a FrameDecoder, rather than read to a Vec first, but it's not possible as Reader doesn't implement the typical std::io::Read sync trait - only Bevy's custom async traits :(

@cart any thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make adapters like so:

struct AsyncWriteSyncAdapter<'a>(&'a mut Writer);

impl Write for AsyncWriteSyncAdapter<'_> {
    fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
        block_on(self.0.write(buf))
    }

    fn flush(&mut self) -> std::io::Result<()> {
        block_on(self.0.flush())
    }
}

struct AsyncReadSyncAdapter<'a>(&'a mut dyn Reader);

impl Read for AsyncReadSyncAdapter<'_> {
    fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
        block_on(self.0.read(buf))
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having to allocate an intermediate vec was a nice perf/memory win.

@JMS55 JMS55 requested a review from fintelia July 8, 2024 01:49
@alice-i-cecile alice-i-cecile added S-Needs-Review Needs reviewer attention (from anyone!) to move forward A-Rendering Drawing game state to the screen labels Jul 9, 2024
Copy link
Contributor

@pcwalton pcwalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, just a couple of suggestions.

@JMS55 JMS55 added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jul 10, 2024
@JMS55 JMS55 requested a review from IceSentry July 11, 2024 20:46
@JMS55 JMS55 removed the request for review from IceSentry July 11, 2024 22:02
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Jul 15, 2024
Merged via the queue into bevyengine:main with commit 6e8d43a Jul 15, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jul 16, 2024
# Objective

- #14193 changed the bunny
meshlet url but didn't update example metadata

## Solution

- Also update the url there
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Assets Load files from disk to use for things like images, models, and sounds A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants