Skip to content

fn prepare_intra_edges is slower in rav1d than dav1d #1394

@jrmuizel

Description

@jrmuizel

Specifically this loop:

            for i in 0..px_have {
                left[sz - 1 - i] = *(dst + (i as isize * stride - 1)).index::<BD>();
            }
            for (int i = 0; i < px_have; i++)
                left[sz - 1 - i] = dst[PXSTRIDE(stride) * i - 1];

Rust version

57      lea rcx, qword [rax + r13 * 1]
108     cmp rcx, rdx
        jnb 0xfdb8d
15      lea rcx, qword [r10 + rbx * 1]
59      cmp rcx, rsi
        jnb 0xfdba0
56      movzx ecx, byte [rbp + rax * 1]
502     mov byte [r11 + rbx * 1], cl
161     dec rbx
26      add rax, r15
48      cmp r9, rbx
        jnz 0xfd500

C version

9       movzx eax, byte [rdi + r11 * 1]
316     lea ebp, dword [rcx + 0x1]
18      movsxd rbp, ebp
17      mov byte [r15 + rbp * 1], al
124     movzx eax, byte [rdi + rsi * 1]
159     movsxd rcx, ecx
24      mov byte [r15 + rcx * 1], al
47      add rdi, rbx
37      add ecx, -0x2
23      add rdx, -0x2
        jnz 0x143e20

The Rust version has 1032 cycles vs C's 774

The C version has been unrolled once and doesn't have the bounds checks but I don't have any data as to why it's faster than the Rust version.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions