Skip to content

Profile-guided optimizations (PGO) #1408

@Kobzol

Description

@Kobzol

I was wondering whether you have considered using PGO for optimizing rav1d. Both for the straightforward "let's generate more optimized artifacts" approach, but also for the less obvious "let's do PGO, see what it has optimized and try to backport these optimizations to the original source code".

rav1d contains a lot of assembly that most likely won't be touched by LLVM's PGO pipeline, but out of curiosity, I tried to apply https://github.com/Kobzol/cargo-pgo to rav1d, and it still seems to produce a quite nice ~2% speedup:

hyperfine --warmup 0 --runs 1 "./rav1d -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1" "./rav1d-pgo -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1"
Benchmark 1: ./rav1d -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1
  Time (abs ≡):        50.679 s               [User: 50.599 s, System: 0.077 s]
 
Benchmark 2: ./rav1d-pgo -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1
  Time (abs ≡):        49.915 s               [User: 49.840 s, System: 0.074 s]
 
Summary
  ./rav1d-pgo -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1 ran
    1.02 times faster than ./rav1d -q -i Chimera-AV1-8bit-1280x720-3363kbps.ivf -o /dev/null --threads=1

Anyway, I don't have more than that, just was curious what do you think of this approach.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions