-
Notifications
You must be signed in to change notification settings - Fork 57
Open
Labels
Description
Specifically this loop:
for i in 0..px_have {
left[sz - 1 - i] = *(dst + (i as isize * stride - 1)).index::<BD>();
}
for (int i = 0; i < px_have; i++)
left[sz - 1 - i] = dst[PXSTRIDE(stride) * i - 1];
Rust version
57 lea rcx, qword [rax + r13 * 1]
108 cmp rcx, rdx
jnb 0xfdb8d
15 lea rcx, qword [r10 + rbx * 1]
59 cmp rcx, rsi
jnb 0xfdba0
56 movzx ecx, byte [rbp + rax * 1]
502 mov byte [r11 + rbx * 1], cl
161 dec rbx
26 add rax, r15
48 cmp r9, rbx
jnz 0xfd500
C version
9 movzx eax, byte [rdi + r11 * 1]
316 lea ebp, dword [rcx + 0x1]
18 movsxd rbp, ebp
17 mov byte [r15 + rbp * 1], al
124 movzx eax, byte [rdi + rsi * 1]
159 movsxd rcx, ecx
24 mov byte [r15 + rcx * 1], al
47 add rdi, rbx
37 add ecx, -0x2
23 add rdx, -0x2
jnz 0x143e20
The Rust version has 1032 cycles vs C's 774
The C version has been unrolled once and doesn't have the bounds checks but I don't have any data as to why it's faster than the Rust version.