-
-
Notifications
You must be signed in to change notification settings - Fork 516
Description
Hi! While investigating a different issue (rust-lang/rust#146497), I found out that compiler optimizes a result computation out of an inner benchmarking loop of criterion
for many nalgebra
benchmarks.
For example, here is how assembly looks like for the mat2_mul_v
benchmark:

It is possible to disable this optimization by applying black_hole()
on all function arguments inside the loop:
diff --git a/benches/common/macros.rs b/benches/common/macros.rs
index 2f64f83a1986..e65bac51fd2d 100644
--- a/benches/common/macros.rs
+++ b/benches/common/macros.rs
@@ -4,12 +4,14 @@ macro_rules! bench_binop(
($name: ident, $t1: ty, $t2: ty, $binop: ident) => {
fn $name(bh: &mut criterion::Criterion) {
use rand::SeedableRng;
+ use std::hint::black_box;
+
let mut rng = IsaacRng::seed_from_u64(0);
let a = rng.random::<$t1>();
let b = rng.random::<$t2>();
bh.bench_function(stringify!($name), move |bh| bh.iter(|| {
- a.$binop(b)
+ black_box(a).$binop(black_box(b))
}));
}
}
Then the same benchmark looks different:

Now it measures an actual computation and criterion
reports the difference:
mat2_mul_v time: [1.2087 ns 1.2094 ns 1.2105 ns]
change: [+279.66% +279.89% +280.13%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
How to check this yourself
First, run the benchmark binary under lldb
with a single benchmark name as a parameter:
lldb target/release/deps/nalgebra_bench-5cfdacb5106e683c -- --bench mat2_mul_v
Then find an address of the Timespec::now()
:
(lldb) image lookup -r -n ".*::Timespec::now::.*"
1 match found in /home/im/projects/patches/nalgebra/target/release/deps/nalgebra_bench-5cfdacb5106e683c:
Address: nalgebra_bench-5cfdacb5106e683c[0x000000000042f9a0] (nalgebra_bench-5cfdacb5106e683c.PT_LOAD[0]..text + 4388320)
Summary: nalgebra_bench-5cfdacb5106e683c`std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
Then set breakpoint on it:
(lldb) breakpoint set -s nalgebra_bench-5cfdacb5106e683c -a 0x000000000042f9a0
Breakpoint 1: where = nalgebra_bench-5cfdacb5106e683c`std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb, address = 0x000000000042f9a0
Then run the binary:
(lldb) run
Process 384495 launched: '/home/im/projects/patches/nalgebra/target/release/deps/nalgebra_bench-5cfdacb5106e683c' (x86_64)
Process 384495 stopped and restarted: thread 1 received signal: SIGCHLD
Gnuplot not found, using plotters backend
Benchmarking mat2_mul_v: Warming up for 3.0000 sProcess 384495 stopped
* thread #1, name = 'nalgebra_bench-', stop reason = breakpoint 1.1
frame #0: 0x00005555559839a0 nalgebra_bench-5cfdacb5106e683c`std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
nalgebra_bench-5cfdacb5106e683c`std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb:
-> 0x5555559839a0 <+0>: pushq %rbp
0x5555559839a1 <+1>: movq %rsp, %rbp
0x5555559839a4 <+4>: pushq %rbx
0x5555559839a5 <+5>: subq $0x18, %rsp
Now you are inside the Timespec::now()
, need to step outside:
(lldb) finish
Process 384495 stopped
* thread #1, name = 'nalgebra_bench-', stop reason = step out
frame #0: 0x0000555555775095 nalgebra_bench-5cfdacb5106e683c`criterion::bencher::Bencher$LT$M$GT$::iter::h66e8fc942edf1939 + 37
nalgebra_bench-5cfdacb5106e683c`criterion::bencher::Bencher$LT$M$GT$::iter::h66e8fc942edf1939:
-> 0x555555775095 <+37>: movq %rax, 0x40(%rsp)
0x55555577509a <+42>: movl %edx, %ebp
0x55555577509c <+44>: movl $0x1, %edi
0x5555557750a1 <+49>: callq 0x5555559839a0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
Now you are inside the relevant instance of criterion::bencher::Bencher::iter()
and can check the assembly:
(lldb) disassemble
...
Note that the file name, addresses and full function names will be different in your case.
P.S. I will create a PR to fix this.
P.P.S. Title lies a bit: actually, criterion
measures one computation and a multiple iterations of outval.clone()
. Final benchmark result still does not make sense though.