-
Notifications
You must be signed in to change notification settings - Fork 59
struct Av1Block
: narrow field types
#1460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…MODES` table with a `const fn` w/ a `match`
…f a `match` since it optimizes better
Note that `rav1d_msac_decode_symbol_adapt16` already truncates its return to `< 16`, so the check is also a no-op here.
… handle negatives
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, though I'm a bit worried about all of the unwrap
s added thanks to using InRange
in various places. Are you sure that adding all those checks doesn't degrade performance? Can you confirm that they're optimized out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, though I'm a bit worried about all of the
unwrap
s added thanks to usingInRange
in various places. Are you sure that adding all those checks doesn't degrade performance? Can you confirm that they're optimized out?
They generally should be. I'll go add the // Elided
comments. Ones like Av1BlockInterRefIndex::new(N).unwrap()
or Av1BlockInterRefIndex::new(N + rav1d_msac_decode_bool_adapt(...) as i8).unwrap()
are elided since it knows N
and the range of a bool
. Some others using rav1d_msac_decode_symbol_adapt8
or rav1d_msac_decode_symbol_adapt16
are elided because they truncate to 8/16.
I think Av1BlockInterRefIndex::new(frame_hdr.skip_mode.refs[0]).unwrap(),
doesn't, but we can also make frame_hdr.skip_mode.refs
use an InRange
type (didn't want to do too much just in this PR).
I also measured performance, and there's basically no difference (just noise, < 0.1% different).
The benchmark script found that the merge commit 8eb6932 (#951) was one of the largest perf regressions (+4.7% on 80 threads, +4.3% on 150 threads). While it touched a bunch of things, one significant part was
f.frame_thread.b
, aDisjointMut<Vec<Av1Block>>
.Av1Block
is currently 32 bytes, but could be 24 bytes. This PR narrows all of the field types (recursively) as much as possible so we know their more precise range. This will help us then re-arrange the layout (Av1Block
is not passed to asm) and get it down to 24 bytes (this PR itself is perf neutral, although it should remove a bunch of bounds checks). Plus, this PR just makes things more idiomatic. Moreover, I think it may be possible to switch fromDisjointMut
toRelaxed
atomics forAv1Block
. It should also help makeAv1Block: FromZeroes
, which will let me optimize the.resize
withnew_zeroed_slice
.