-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Labels
Description
I was wondering whether you have considered using PGO for optimizing rav1d
. Both for the straightforward "let's generate more optimized artifacts" approach, but also for the less obvious "let's do PGO, see what it has optimized and try to backport these optimizations to the original source code".
rav1d
contains a lot of assembly that most likely won't be touched by LLVM's PGO pipeline, but out of curiosity, I tried to apply https://github.com/Kobzol/cargo-pgo to rav1d
, and it still seems to produce a quite nice ~2% speedup:
hyperfine --warmup 0 --runs 1 "./rav1d -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1" "./rav1d-pgo -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1"
Benchmark 1: ./rav1d -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1
Time (abs ≡): 50.679 s [User: 50.599 s, System: 0.077 s]
Benchmark 2: ./rav1d-pgo -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1
Time (abs ≡): 49.915 s [User: 49.840 s, System: 0.074 s]
Summary
./rav1d-pgo -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1 ran
1.02 times faster than ./rav1d -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1
Anyway, I don't have more than that, just was curious what do you think of this approach.
djc and jensk-devTheScreechingBagel