Skip to content

Conversation

kkysen
Copy link
Collaborator

@kkysen kkysen commented Oct 12, 2025

The benchmark script found that the merge commit 8eb6932 (#951) was one of the largest perf regressions (+4.7% on 80 threads, +4.3% on 150 threads). While it touched a bunch of things, one significant part was f.frame_thread.b, a DisjointMut<Vec<Av1Block>>. Av1Block is currently 32 bytes, but could be 24 bytes. This PR narrows all of the field types (recursively) as much as possible so we know their more precise range. This will help us then re-arrange the layout (Av1Block is not passed to asm) and get it down to 24 bytes (this PR itself is perf neutral, although it should remove a bunch of bounds checks). Plus, this PR just makes things more idiomatic. Moreover, I think it may be possible to switch from DisjointMut to Relaxed atomics for Av1Block. It should also help make Av1Block: FromZeroes, which will let me optimize the .resize with new_zeroed_slice.

@kkysen kkysen requested a review from randomPoison October 12, 2025 07:06
Copy link
Collaborator

@randomPoison randomPoison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, though I'm a bit worried about all of the unwraps added thanks to using InRange in various places. Are you sure that adding all those checks doesn't degrade performance? Can you confirm that they're optimized out?

Copy link
Collaborator Author

@kkysen kkysen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, though I'm a bit worried about all of the unwraps added thanks to using InRange in various places. Are you sure that adding all those checks doesn't degrade performance? Can you confirm that they're optimized out?

They generally should be. I'll go add the // Elided comments. Ones like Av1BlockInterRefIndex::new(N).unwrap() or Av1BlockInterRefIndex::new(N + rav1d_msac_decode_bool_adapt(...) as i8).unwrap() are elided since it knows N and the range of a bool. Some others using rav1d_msac_decode_symbol_adapt8 or rav1d_msac_decode_symbol_adapt16 are elided because they truncate to 8/16.

I think Av1BlockInterRefIndex::new(frame_hdr.skip_mode.refs[0]).unwrap(), doesn't, but we can also make frame_hdr.skip_mode.refs use an InRange type (didn't want to do too much just in this PR).

I also measured performance, and there's basically no difference (just noise, < 0.1% different).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants