-
-
Notifications
You must be signed in to change notification settings - Fork 516
Make benchmarks measure an actual computation #1549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
8338da6
to
9d1c4ef
Compare
Changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me overall, thanks for taking care of this. I just noted some questions in the code about the choice of what to black-box, but I'll be the first to admit that I don't have a lot of experience how to get the compiler not to optimize certain things out. I'd just feel better if you explained some of your rationale behind what exactly you black-boxed.
benches/core/matrix.rs
Outdated
|
||
bench.bench_function("mat8_mul_mat8", move |bh| bh.iter(|| &a * &b)); | ||
bench.bench_function("mat8_mul_mat8", move |bh| { | ||
bh.iter(|| black_box(&a) * black_box(&b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honest question, would the results have been different if you had black-boxed the product rather than the individual components?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is a high chance that the benchmark result will be different.
bh.iter(|| &a * &b)
In this^ example, criterion
calls || &a * &b
closure multiple times to measure how much time it takes. But compiler is smart enough to notice that a
and b
cannot change within bh.iter(...)
, so it rewrites everything like this:
let optimized = &a * &b;
bh.iter(|| optimized.clone())
What black_box()
can do here is to make compiler think that black_box(x)
produces completely random valid value of the same type as x
. Of course, in a compiled binary it is a no-op and always produces just the value of x
.
Wrapping the product with black_box()
changes nothing here as compiler will still be able to see that arguments of mul()
are not changing and thus it will be able to move mul()
out of the loop:
let optimized = &a * &b;
bh.iter(|| black_box(optimized.clone()))
Also, the return value of the closure already passed to black_box()
inside Criterion's bh.iter(...)
to ensure that call to a closure is not removed during optimization. Here black_box(x)
has slightly different meaning - some unspecified computation that produces side effects based on the value of x
(and thus value of x
is important and it cannot be removed from compiled code entirely).
And, as we are interesting in measuring the performance of mul()
, we have two options:
- Generate proper random values for arguments of
mul()
on each iteration ofbh.iter(...)
. This may be viable ifmul()
is slow enough to make random arg generation code appear insignificant in a total measured time. This option is not viable in general fornalgebra
as a lot of benchmarks measure very fast operations that can be optimized down to just a few machine instructions (like Vector3 x Scalar multiplication etc.). - Disguise unchanged arguments of
mul()
as a random values on each iteration ofbh.iter(...)
. This is exactly what I did here usingblack_box()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great explanation, thank you. I think, in principle I'd be fine merging this, I'm just wondering if you had considered refactoring to iter_batched
. This is what I do in my projects and the criterion docs say
If your routine requires some per-iteration setup that shouldn’t be timed, use iter_batched or iter_batched_ref
which should be a way to supply new random matrices in every iteration of the benchmark. I don' think this will matter in these cases here, but in general this could help confusing the processor pipeline enough to get a more realistic measurement. However, I've also found this unresolved issue bheisler/criterion.rs#475 about measurement overhead in iter_batched
, which I wasn't aware of before.
I don't want to make your life more complicated and I'm very grateful you're tackling this problem. I was just thinking we should really nail the benchmarks, since you also have some other cool things in the pipeline, which do depend on accurate benchmarks. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I wanted to keep this change minimal so I haven't considered to use other Criterion functions.
I tried to use iter_batched
for mat2_mul_v
(same benchmark that I used for demonstration in the original issue) right now:
diff --git a/benches/common/macros.rs b/benches/common/macros.rs
index c3e12aaaef55..3521ef7ddc8c 100644
--- a/benches/common/macros.rs
+++ b/benches/common/macros.rs
@@ -4,15 +4,15 @@ macro_rules! bench_binop(
($name: ident, $t1: ty, $t2: ty, $binop: ident) => {
fn $name(bh: &mut criterion::Criterion) {
use rand::SeedableRng;
- use std::hint::black_box;
let mut rng = IsaacRng::seed_from_u64(0);
- let a = rng.random::<$t1>();
- let b = rng.random::<$t2>();
- bh.bench_function(stringify!($name), move |bh| bh.iter(|| {
- black_box(&a).$binop(black_box(b))
- }));
+ bh.bench_function(stringify!($name), move |bh| bh.iter_batched(
+ || (rng.random::<$t1>(), rng.random::<$t2>()),
+ |args| {
+ args.0.$binop(args.1)
+ },
+ criterion::BatchSize::SmallInput));
}
}
);
This somehow improved performance vs. my current changes, but still regresses vs. current main
:

I tried to check the generated assembly, but for iter_batched()
it is much longer and I am not that good at reading assembly:
click for details...
nalgebra_bench-ccbcfb07ef18979a`criterion::bencher::Bencher$LT$M$GT$::iter_batched::heec5543a72133bcc:
0x55555562d840 <+0>: pushq %rbp
0x55555562d841 <+1>: pushq %r15
0x55555562d843 <+3>: pushq %r14
0x55555562d845 <+5>: pushq %r13
0x55555562d847 <+7>: pushq %r12
0x55555562d849 <+9>: pushq %rbx
0x55555562d84a <+10>: subq $0xa8, %rsp
0x55555562d851 <+17>: movb $0x1, 0x30(%rdi)
0x55555562d855 <+21>: movq 0x28(%rdi), %r15
0x55555562d859 <+25>: leaq 0x9(%r15), %rcx
0x55555562d85d <+29>: movabsq $-0x3333333333333333, %rdx ; imm = 0xCCCCCCCCCCCCCCCD
0x55555562d867 <+39>: movq %rcx, %rax
0x55555562d86a <+42>: mulq %rdx
0x55555562d86d <+45>: movq %rdx, 0x78(%rsp)
0x55555562d872 <+50>: cmpq $0x9, %rcx
0x55555562d876 <+54>: jbe 0x55555562dee6 ; <+1702>
0x55555562d87c <+60>: movq %rsi, %r14
0x55555562d87f <+63>: movq %rdi, %rbx
0x55555562d882 <+66>: movl $0x1, %edi
0x55555562d887 <+71>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
-> 0x55555562d88c <+76>: movq %rax, 0x88(%rsp)
0x55555562d894 <+84>: movl %edx, 0x74(%rsp)
0x55555562d898 <+88>: movq $0x0, (%rbx)
0x55555562d89f <+95>: movl $0x0, 0x8(%rbx)
0x55555562d8a6 <+102>: leaq -0x1(%r15), %rax
0x55555562d8aa <+106>: cmpq $0xa, %rax
0x55555562d8ae <+110>: movq %rbx, 0x80(%rsp)
0x55555562d8b6 <+118>: jae 0x55555562da0f ; <+463>
0x55555562d8bc <+124>: xorl %r12d, %r12d
0x55555562d8bf <+127>: xorl %ebx, %ebx
0x55555562d8c1 <+129>: jmp 0x55555562d921 ; <+225>
0x55555562d8c3 <+131>: nopw %cs:(%rax,%rax)
0x55555562d8d0 <+144>: addl $0xc4653600, %r12d ; imm = 0xC4653600
0x55555562d8d7 <+151>: incq %rbx
0x55555562d8da <+154>: movss 0x40(%rsp), %xmm1
0x55555562d8e0 <+160>: addss 0x20(%rsp), %xmm1
0x55555562d8e6 <+166>: movss 0x48(%rsp), %xmm0
0x55555562d8ec <+172>: addss 0x18(%rsp), %xmm0
0x55555562d8f2 <+178>: movd %xmm1, %eax
0x55555562d8f6 <+182>: movd %xmm0, %ecx
0x55555562d8fa <+186>: shlq $0x20, %rcx
0x55555562d8fe <+190>: orq %rcx, %rax
0x55555562d901 <+193>: movq %rbx, (%r13)
0x55555562d905 <+197>: movl %r12d, 0x8(%r13)
0x55555562d909 <+201>: movq %rax, (%rsp)
0x55555562d90d <+205>: movss (%rsp), %xmm0
0x55555562d912 <+210>: movss 0x4(%rsp), %xmm0
0x55555562d918 <+216>: decq %r15
0x55555562d91b <+219>: je 0x55555562de65 ; <+1573>
0x55555562d921 <+225>: movq %rsp, %rdi
0x55555562d924 <+228>: movq %r14, %rsi
0x55555562d927 <+231>: callq 0x5555556dc390 ; nalgebra_bench::core::matrix::mat2_mul_v::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h089cab30bae2dfb0
0x55555562d92c <+236>: movss 0x10(%rsp), %xmm0
0x55555562d932 <+242>: movss 0x14(%rsp), %xmm2
0x55555562d938 <+248>: movss (%rsp), %xmm1
0x55555562d93d <+253>: mulss %xmm0, %xmm1
0x55555562d941 <+257>: movss %xmm1, 0x40(%rsp)
0x55555562d947 <+263>: mulss 0x4(%rsp), %xmm0
0x55555562d94d <+269>: movss %xmm0, 0x48(%rsp)
0x55555562d953 <+275>: movss 0x8(%rsp), %xmm0
0x55555562d959 <+281>: mulss %xmm2, %xmm0
0x55555562d95d <+285>: movss %xmm0, 0x20(%rsp)
0x55555562d963 <+291>: mulss 0xc(%rsp), %xmm2
0x55555562d969 <+297>: movss %xmm2, 0x18(%rsp)
0x55555562d96f <+303>: movl $0x1, %edi
0x55555562d974 <+308>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
0x55555562d979 <+313>: movq %rax, %r13
0x55555562d97c <+316>: movl %edx, %ebp
0x55555562d97e <+318>: movl $0x1, %edi
0x55555562d983 <+323>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
0x55555562d988 <+328>: movq %rax, 0x50(%rsp)
0x55555562d98d <+333>: movl %edx, 0x58(%rsp)
0x55555562d991 <+337>: movq %r13, 0x28(%rsp)
0x55555562d996 <+342>: movl %ebp, 0x30(%rsp)
0x55555562d99a <+346>: movq %rsp, %rdi
0x55555562d99d <+349>: leaq 0x50(%rsp), %rsi
0x55555562d9a2 <+354>: leaq 0x28(%rsp), %rdx
0x55555562d9a7 <+359>: callq 0x55555593ac90 ; std::sys::pal::unix::time::Timespec::sub_timespec::hb206577083debcb5
0x55555562d9ac <+364>: movzbl (%rsp), %eax
0x55555562d9b0 <+368>: testb %al, %al
0x55555562d9b2 <+370>: jne 0x55555562d9c0 ; <+384>
0x55555562d9b4 <+372>: movq 0x8(%rsp), %rcx
0x55555562d9b9 <+377>: jmp 0x55555562d9c2 ; <+386>
0x55555562d9bb <+379>: nopl (%rax,%rax)
0x55555562d9c0 <+384>: xorl %ecx, %ecx
0x55555562d9c2 <+386>: addq %rcx, %rbx
0x55555562d9c5 <+389>: movq 0x80(%rsp), %r13
0x55555562d9cd <+397>: jb 0x55555562d9f7 ; <+439>
0x55555562d9cf <+399>: testb $0x1, %al
0x55555562d9d1 <+401>: movl 0x10(%rsp), %eax
0x55555562d9d5 <+405>: movl $0x0, %ecx
0x55555562d9da <+410>: cmovnel %ecx, %eax
0x55555562d9dd <+413>: addl %eax, %r12d
0x55555562d9e0 <+416>: cmpl $0x3b9aca00, %r12d ; imm = 0x3B9ACA00
0x55555562d9e7 <+423>: jb 0x55555562d8da ; <+154>
0x55555562d9ed <+429>: cmpq $-0x1, %rbx
0x55555562d9f1 <+433>: jne 0x55555562d8d0 ; <+144>
0x55555562d9f7 <+439>: leaq 0x3867bc(%rip), %rdi
0x55555562d9fe <+446>: leaq 0x4456db(%rip), %rdx ; __dso_handle + 29352
0x55555562da05 <+453>: movl $0x1e, %esi
0x55555562da0a <+458>: callq 0x5555555b1fd0 ; core::option::expect_failed::h50b71e74d7945a60
0x55555562da0f <+463>: shrq $0x3, 0x78(%rsp)
0x55555562da15 <+469>: movabsq $0xffffffffffffffe, %rax ; imm = 0xFFFFFFFFFFFFFFE
0x55555562da1f <+479>: movq $0x0, 0x60(%rsp)
0x55555562da28 <+488>: incq %rax
0x55555562da2b <+491>: movq %rax, 0x90(%rsp)
0x55555562da33 <+499>: xorl %esi, %esi
0x55555562da35 <+501>: jmp 0x55555562da55 ; <+533>
0x55555562da37 <+503>: nopw (%rax,%rax)
0x55555562da40 <+512>: movq 0x40(%rsp), %rsi
0x55555562da45 <+517>: addq %rbx, %rsi
0x55555562da48 <+520>: movq 0x28(%r13), %r15
0x55555562da4c <+524>: cmpq %r15, %rsi
0x55555562da4f <+527>: jae 0x55555562de65 ; <+1573>
0x55555562da55 <+533>: movq %r15, %r13
0x55555562da58 <+536>: subq %rsi, %r13
0x55555562da5b <+539>: movq 0x78(%rsp), %rax
0x55555562da60 <+544>: cmpq %rax, %r13
0x55555562da63 <+547>: cmovaeq %rax, %r13
0x55555562da67 <+551>: movq %r13, %rax
0x55555562da6a <+554>: movl $0x18, %ecx
0x55555562da6f <+559>: mulq %rcx
0x55555562da72 <+562>: jo 0x55555562defe ; <+1726>
0x55555562da78 <+568>: movabsq $0x7ffffffffffffffd, %rcx ; imm = 0x7FFFFFFFFFFFFFFD
0x55555562da82 <+578>: cmpq %rcx, %rax
0x55555562da85 <+581>: jae 0x55555562defe ; <+1726>
0x55555562da8b <+587>: movq %rsi, 0x40(%rsp)
0x55555562da90 <+592>: testq %rax, %rax
0x55555562da93 <+595>: movq %r15, 0x68(%rsp)
0x55555562da98 <+600>: je 0x55555562dac0 ; <+640>
0x55555562da9a <+602>: movq %rax, %r12
0x55555562da9d <+605>: movq %rax, %rdi
0x55555562daa0 <+608>: callq *0x46d20a(%rip) ; _GLOBAL_OFFSET_TABLE_ + 320
0x55555562daa6 <+614>: movq %rax, %rbx
0x55555562daa9 <+617>: movq %r13, %rbp
0x55555562daac <+620>: testq %rax, %rax
0x55555562daaf <+623>: jne 0x55555562dac7 ; <+647>
0x55555562dab1 <+625>: jmp 0x55555562df2a ; <+1770>
0x55555562dab6 <+630>: nopw %cs:(%rax,%rax)
0x55555562dac0 <+640>: movl $0x4, %ebx
0x55555562dac5 <+645>: xorl %ebp, %ebp
0x55555562dac7 <+647>: movq %rbx, %r12
0x55555562daca <+650>: movq %r13, 0x48(%rsp)
0x55555562dacf <+655>: movq %rsp, %r15
0x55555562dad2 <+658>: nopw %cs:(%rax,%rax)
0x55555562dae0 <+672>: movq %r15, %rdi
0x55555562dae3 <+675>: movq %r14, %rsi
0x55555562dae6 <+678>: callq 0x5555556dc390 ; nalgebra_bench::core::matrix::mat2_mul_v::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h089cab30bae2dfb0
0x55555562daeb <+683>: movq 0x10(%rsp), %rax
0x55555562daf0 <+688>: movq %rax, 0x10(%r12)
0x55555562daf5 <+693>: movups (%rsp), %xmm0
0x55555562daf9 <+697>: movups %xmm0, (%r12)
0x55555562dafe <+702>: addq $0x18, %r12
0x55555562db02 <+706>: decq %r13
0x55555562db05 <+709>: jne 0x55555562dae0 ; <+672>
0x55555562db07 <+711>: movq %rbp, (%rsp)
0x55555562db0b <+715>: movq %rbx, 0x8(%rsp)
0x55555562db10 <+720>: movq 0x48(%rsp), %r15
0x55555562db15 <+725>: movq %r15, 0x10(%rsp)
0x55555562db1a <+730>: movq (%rsp), %rax
0x55555562db1e <+734>: movq %rax, 0x20(%rsp)
0x55555562db23 <+739>: movq 0x8(%rsp), %rax
0x55555562db28 <+744>: movq %rax, 0x18(%rsp)
0x55555562db2d <+749>: movq 0x10(%rsp), %rbx
0x55555562db32 <+754>: leaq (,%r15,8), %r12
0x55555562db3a <+762>: cmpq 0x90(%rsp), %r15
0x55555562db42 <+770>: ja 0x55555562df14 ; <+1748>
0x55555562db48 <+776>: movq 0x68(%rsp), %rax
0x55555562db4d <+781>: cmpq 0x40(%rsp), %rax
0x55555562db52 <+786>: jne 0x55555562db60 ; <+800>
0x55555562db54 <+788>: movl $0x4, %ebp
0x55555562db59 <+793>: xorl %r15d, %r15d
0x55555562db5c <+796>: jmp 0x55555562dbb0 ; <+880>
0x55555562db5e <+798>: nop
0x55555562db60 <+800>: testq %r15, %r15
0x55555562db63 <+803>: je 0x55555562db7b ; <+827>
0x55555562db65 <+805>: movq %r12, %rdi
0x55555562db68 <+808>: callq *0x46d142(%rip) ; _GLOBAL_OFFSET_TABLE_ + 320
0x55555562db6e <+814>: movq %rax, %rbp
0x55555562db71 <+817>: testq %rbp, %rbp
0x55555562db74 <+820>: jne 0x55555562dbb0 ; <+880>
0x55555562db76 <+822>: jmp 0x55555562df0a ; <+1738>
0x55555562db7b <+827>: movq $0x0, (%rsp)
0x55555562db83 <+835>: movl $0x8, %esi
0x55555562db88 <+840>: movq %rsp, %rdi
0x55555562db8b <+843>: movq %r12, %rdx
0x55555562db8e <+846>: callq *0x46d064(%rip) ; _GLOBAL_OFFSET_TABLE_ + 136
0x55555562db94 <+852>: testl %eax, %eax
0x55555562db96 <+854>: jne 0x55555562df0a ; <+1738>
0x55555562db9c <+860>: movq (%rsp), %rbp
0x55555562dba0 <+864>: testq %rbp, %rbp
0x55555562dba3 <+867>: je 0x55555562df0a ; <+1738>
0x55555562dba9 <+873>: nopl (%rax)
0x55555562dbb0 <+880>: movq %r15, 0x28(%rsp)
0x55555562dbb5 <+885>: movq %rbp, 0x30(%rsp)
0x55555562dbba <+890>: movq $0x0, 0x38(%rsp)
0x55555562dbc3 <+899>: movl $0x1, %edi
0x55555562dbc8 <+904>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
0x55555562dbcd <+909>: movl %edx, 0x68(%rsp)
0x55555562dbd1 <+913>: movq %rax, %r12
0x55555562dbd4 <+916>: cmpq %r15, %rbx
0x55555562dbd7 <+919>: ja 0x55555562de37 ; <+1527>
0x55555562dbdd <+925>: movl $0x0, %esi
0x55555562dbe2 <+930>: testq %rbx, %rbx
0x55555562dbe5 <+933>: movq 0x18(%rsp), %r15
0x55555562dbea <+938>: je 0x55555562dd4d ; <+1293>
0x55555562dbf0 <+944>: leaq (%rbx,%rbx,2), %rdi
0x55555562dbf4 <+948>: leaq -0x18(,%rdi,8), %rcx
0x55555562dbfc <+956>: movq %rcx, %rax
0x55555562dbff <+959>: movabsq $-0x5555555555555555, %rdx ; imm = 0xAAAAAAAAAAAAAAAB
0x55555562dc09 <+969>: mulq %rdx
0x55555562dc0c <+972>: cmpq $0x5f, %rcx
0x55555562dc10 <+976>: jbe 0x55555562dc46 ; <+1030>
0x55555562dc12 <+978>: shrq $0x4, %rdx
0x55555562dc16 <+982>: leaq (,%rsi,8), %rcx
0x55555562dc1e <+990>: addq %rbp, %rcx
0x55555562dc21 <+993>: leaq (%rdx,%rdx,2), %rax
0x55555562dc25 <+997>: leaq (%r15,%rax,8), %rax
0x55555562dc29 <+1001>: addq $0x18, %rax
0x55555562dc2d <+1005>: cmpq %rax, %rcx
0x55555562dc30 <+1008>: jae 0x55555562dc4e ; <+1038>
0x55555562dc32 <+1010>: leaq (%rsi,%rdx), %rax
0x55555562dc36 <+1014>: leaq 0x8(,%rax,8), %rax
0x55555562dc3e <+1022>: addq %rbp, %rax
0x55555562dc41 <+1025>: cmpq %rax, %r15
0x55555562dc44 <+1028>: jae 0x55555562dc4e ; <+1038>
0x55555562dc46 <+1030>: movq %r15, %rax
0x55555562dc49 <+1033>: jmp 0x55555562dcf0 ; <+1200>
0x55555562dc4e <+1038>: movabsq $0xffffffffffffffe, %rax ; imm = 0xFFFFFFFFFFFFFFE
0x55555562dc58 <+1048>: andq %rax, %rdx
0x55555562dc5b <+1051>: leaq (,%rdx,8), %rax
0x55555562dc63 <+1059>: leaq (%rax,%rax,2), %rax
0x55555562dc67 <+1063>: movq %r15, %r8
0x55555562dc6a <+1066>: xorl %r9d, %r9d
0x55555562dc6d <+1069>: nopl (%rax)
0x55555562dc70 <+1072>: movupd (%r8), %xmm1
0x55555562dc75 <+1077>: movupd 0x10(%r8), %xmm2
0x55555562dc7b <+1083>: movupd 0x20(%r8), %xmm3
0x55555562dc81 <+1089>: movapd %xmm2, %xmm4
0x55555562dc85 <+1093>: movapd %xmm1, %xmm0
0x55555562dc89 <+1097>: movsd %xmm3, %xmm0 ; xmm0 = xmm3[0],xmm0[1]
0x55555562dc8d <+1101>: movapd %xmm3, %xmm5
0x55555562dc91 <+1105>: movsd %xmm2, %xmm3 ; xmm3 = xmm2[0],xmm3[1]
0x55555562dc95 <+1109>: shufps $0x2, %xmm1, %xmm2 ; xmm2 = xmm2[2,0],xmm1[0,0]
0x55555562dc99 <+1113>: shufps $0xe2, %xmm1, %xmm2 ; xmm2 = xmm2[2,0],xmm1[2,3]
0x55555562dc9d <+1117>: shufps $0x13, %xmm1, %xmm4 ; xmm4 = xmm4[3,0],xmm1[1,0]
0x55555562dca1 <+1121>: shufps $0xe2, %xmm1, %xmm4 ; xmm4 = xmm4[2,0],xmm1[2,3]
0x55555562dca5 <+1125>: shufps $0xe2, %xmm1, %xmm0 ; xmm0 = xmm0[2,0],xmm1[2,3]
0x55555562dca9 <+1129>: shufps $0x31, %xmm1, %xmm5 ; xmm5 = xmm5[1,0],xmm1[3,0]
0x55555562dcad <+1133>: shufps $0xe2, %xmm1, %xmm5 ; xmm5 = xmm5[2,0],xmm1[2,3]
0x55555562dcb1 <+1137>: movapd %xmm3, %xmm1
0x55555562dcb5 <+1141>: shufps $0xe8, %xmm3, %xmm1 ; xmm1 = xmm1[0,2],xmm3[2,3]
0x55555562dcb9 <+1145>: psrlq $0x20, %xmm3
0x55555562dcbe <+1150>: pshufd $0xe8, %xmm3, %xmm3 ; xmm3 = xmm3[0,2,2,3]
0x55555562dcc3 <+1155>: mulps %xmm1, %xmm2
0x55555562dcc6 <+1158>: mulps %xmm4, %xmm1
0x55555562dcc9 <+1161>: mulps %xmm3, %xmm0
0x55555562dccc <+1164>: addps %xmm2, %xmm0
0x55555562dccf <+1167>: mulps %xmm3, %xmm5
0x55555562dcd2 <+1170>: addps %xmm1, %xmm5
0x55555562dcd5 <+1173>: unpcklps %xmm5, %xmm0 ; xmm0 = xmm0[0],xmm5[0],xmm0[1],xmm5[1]
0x55555562dcd8 <+1176>: movups %xmm0, (%rcx,%r9,8)
0x55555562dcdd <+1181>: addq $0x2, %r9
0x55555562dce1 <+1185>: addq $0x30, %r8
0x55555562dce5 <+1189>: cmpq %r9, %rdx
0x55555562dce8 <+1192>: jne 0x55555562dc70 ; <+1072>
0x55555562dcea <+1194>: addq %rdx, %rsi
0x55555562dced <+1197>: addq %r15, %rax
0x55555562dcf0 <+1200>: leaq (%r15,%rdi,8), %rcx
0x55555562dcf4 <+1204>: nopw %cs:(%rax,%rax)
0x55555562dd00 <+1216>: movss 0x10(%rax), %xmm0
0x55555562dd05 <+1221>: movss 0x14(%rax), %xmm1
0x55555562dd0a <+1226>: movss (%rax), %xmm2
0x55555562dd0e <+1230>: mulss %xmm0, %xmm2
0x55555562dd12 <+1234>: mulss 0x4(%rax), %xmm0
0x55555562dd17 <+1239>: movss 0x8(%rax), %xmm3
0x55555562dd1c <+1244>: mulss %xmm1, %xmm3
0x55555562dd20 <+1248>: addss %xmm2, %xmm3
0x55555562dd24 <+1252>: mulss 0xc(%rax), %xmm1
0x55555562dd29 <+1257>: addss %xmm0, %xmm1
0x55555562dd2d <+1261>: movd %xmm3, %edx
0x55555562dd31 <+1265>: movd %xmm1, %edi
0x55555562dd35 <+1269>: shlq $0x20, %rdi
0x55555562dd39 <+1273>: orq %rdi, %rdx
0x55555562dd3c <+1276>: movq %rdx, (%rbp,%rsi,8)
0x55555562dd41 <+1281>: incq %rsi
0x55555562dd44 <+1284>: addq $0x18, %rax
0x55555562dd48 <+1288>: cmpq %rcx, %rax
0x55555562dd4b <+1291>: jne 0x55555562dd00 ; <+1216>
0x55555562dd4d <+1293>: movq %rsi, 0x38(%rsp)
0x55555562dd52 <+1298>: cmpq $0x0, 0x20(%rsp)
0x55555562dd58 <+1304>: je 0x55555562dd63 ; <+1315>
0x55555562dd5a <+1306>: movq %r15, %rdi
0x55555562dd5d <+1309>: callq *0x46cfdd(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464
0x55555562dd63 <+1315>: movl $0x1, %edi
0x55555562dd68 <+1320>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
0x55555562dd6d <+1325>: movq 0x80(%rsp), %r13
0x55555562dd75 <+1333>: movq %rsp, %rdi
0x55555562dd78 <+1336>: movq %rax, 0x98(%rsp)
0x55555562dd80 <+1344>: movl %edx, 0xa0(%rsp)
0x55555562dd87 <+1351>: movq %r12, 0x50(%rsp)
0x55555562dd8c <+1356>: movl 0x68(%rsp), %eax
0x55555562dd90 <+1360>: movl %eax, 0x58(%rsp)
0x55555562dd94 <+1364>: leaq 0x98(%rsp), %rsi
0x55555562dd9c <+1372>: leaq 0x50(%rsp), %rdx
0x55555562dda1 <+1377>: callq 0x55555593ac90 ; std::sys::pal::unix::time::Timespec::sub_timespec::hb206577083debcb5
0x55555562dda6 <+1382>: movzbl (%rsp), %ecx
0x55555562ddaa <+1386>: testb %cl, %cl
0x55555562ddac <+1388>: movq 0x48(%rsp), %rbx
0x55555562ddb1 <+1393>: jne 0x55555562ddc0 ; <+1408>
0x55555562ddb3 <+1395>: movq 0x8(%rsp), %rax
0x55555562ddb8 <+1400>: jmp 0x55555562ddc2 ; <+1410>
0x55555562ddba <+1402>: nopw (%rax,%rax)
0x55555562ddc0 <+1408>: xorl %eax, %eax
0x55555562ddc2 <+1410>: addq (%r13), %rax
0x55555562ddc6 <+1414>: jb 0x55555562decc ; <+1676>
0x55555562ddcc <+1420>: testb $0x1, %cl
0x55555562ddcf <+1423>: movl 0x10(%rsp), %ecx
0x55555562ddd3 <+1427>: movl $0x0, %edx
0x55555562ddd8 <+1432>: cmovnel %edx, %ecx
0x55555562dddb <+1435>: addl 0x8(%r13), %ecx
0x55555562dddf <+1439>: cmpl $0x3b9aca00, %ecx ; imm = 0x3B9ACA00
0x55555562dde5 <+1445>: jb 0x55555562ddfa ; <+1466>
0x55555562dde7 <+1447>: cmpq $-0x1, %rax
0x55555562ddeb <+1451>: je 0x55555562decc ; <+1676>
0x55555562ddf1 <+1457>: addl $0xc4653600, %ecx ; imm = 0xC4653600
0x55555562ddf7 <+1463>: incq %rax
0x55555562ddfa <+1466>: movq %rax, (%r13)
0x55555562ddfe <+1470>: movl %ecx, 0x8(%r13)
0x55555562de02 <+1474>: movq 0x38(%rsp), %rax
0x55555562de07 <+1479>: movq %rax, 0x10(%rsp)
0x55555562de0c <+1484>: movups 0x28(%rsp), %xmm0
0x55555562de11 <+1489>: movaps %xmm0, (%rsp)
0x55555562de15 <+1493>: movq 0x10(%rsp), %rax
0x55555562de1a <+1498>: movq (%rsp), %rax
0x55555562de1e <+1502>: movq 0x8(%rsp), %rdi
0x55555562de23 <+1507>: testq %rax, %rax
0x55555562de26 <+1510>: je 0x55555562da40 ; <+512>
0x55555562de2c <+1516>: callq *0x46cf0e(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464
0x55555562de32 <+1522>: jmp 0x55555562da40 ; <+512>
0x55555562de37 <+1527>: movl $0x4, %ecx
0x55555562de3c <+1532>: movl $0x8, %r8d
0x55555562de42 <+1538>: leaq 0x28(%rsp), %rdi
0x55555562de47 <+1543>: xorl %esi, %esi
0x55555562de49 <+1545>: movq %rbx, %rdx
0x55555562de4c <+1548>: movq 0x18(%rsp), %r15
0x55555562de51 <+1553>: callq 0x555555555550 ; alloc::raw_vec::RawVecInner$LT$A$GT$::reserve::do_reserve_and_handle::h6eaae75860de7206
0x55555562de56 <+1558>: movq 0x30(%rsp), %rbp
0x55555562de5b <+1563>: movq 0x38(%rsp), %rsi
0x55555562de60 <+1568>: jmp 0x55555562dbf0 ; <+944>
0x55555562de65 <+1573>: movl $0x1, %edi
0x55555562de6a <+1578>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
0x55555562de6f <+1583>: movq %rax, 0x50(%rsp)
0x55555562de74 <+1588>: movl %edx, 0x58(%rsp)
0x55555562de78 <+1592>: movq 0x88(%rsp), %rax
0x55555562de80 <+1600>: movq %rax, 0x28(%rsp)
0x55555562de85 <+1605>: movl 0x74(%rsp), %eax
0x55555562de89 <+1609>: movl %eax, 0x30(%rsp)
0x55555562de8d <+1613>: movq %rsp, %rdi
0x55555562de90 <+1616>: leaq 0x50(%rsp), %rsi
0x55555562de95 <+1621>: leaq 0x28(%rsp), %rdx
0x55555562de9a <+1626>: callq 0x55555593ac90 ; std::sys::pal::unix::time::Timespec::sub_timespec::hb206577083debcb5
0x55555562de9f <+1631>: xorl %eax, %eax
0x55555562dea1 <+1633>: cmpb $0x0, (%rsp)
0x55555562dea5 <+1637>: movl 0x10(%rsp), %ecx
0x55555562dea9 <+1641>: cmovnel %eax, %ecx
0x55555562deac <+1644>: cmoveq 0x8(%rsp), %rax
0x55555562deb2 <+1650>: movq %rax, 0x10(%r13)
0x55555562deb6 <+1654>: movl %ecx, 0x18(%r13)
0x55555562deba <+1658>: addq $0xa8, %rsp
0x55555562dec1 <+1665>: popq %rbx
0x55555562dec2 <+1666>: popq %r12
0x55555562dec4 <+1668>: popq %r13
0x55555562dec6 <+1670>: popq %r14
0x55555562dec8 <+1672>: popq %r15
0x55555562deca <+1674>: popq %rbp
0x55555562decb <+1675>: retq
0x55555562decc <+1676>: leaq 0x3862e7(%rip), %rdi
0x55555562ded3 <+1683>: leaq 0x445206(%rip), %rdx ; __dso_handle + 29352
0x55555562deda <+1690>: movl $0x1e, %esi
0x55555562dedf <+1695>: callq 0x5555555b1fd0 ; core::option::expect_failed::h50b71e74d7945a60
0x55555562dee4 <+1700>: jmp 0x55555562df28 ; <+1768>
0x55555562dee6 <+1702>: leaq 0x379c5e(%rip), %rdi
0x55555562deed <+1709>: leaq 0x43f824(%rip), %rdx ; __dso_handle + 6368
0x55555562def4 <+1716>: movl $0x1c, %esi
0x55555562def9 <+1721>: callq 0x5555555c491e ; std::panicking::begin_panic::h4f2cc586c820a72c
0x55555562defe <+1726>: leaq 0x46c7fb(%rip), %rdi ; __dso_handle + 190664
0x55555562df05 <+1733>: callq 0x5555555aedb0 ; alloc::raw_vec::capacity_overflow::h46cadc9fcf0d8ebe
0x55555562df0a <+1738>: movl $0x4, %eax
0x55555562df0f <+1743>: movq %rax, 0x60(%rsp)
0x55555562df14 <+1748>: leaq 0x43f815(%rip), %rdx ; __dso_handle + 6392
0x55555562df1b <+1755>: movq 0x60(%rsp), %rdi
0x55555562df20 <+1760>: movq %r12, %rsi
0x55555562df23 <+1763>: callq 0x5555555aed83 ; alloc::raw_vec::handle_error::hc389833aee8d6f48
0x55555562df28 <+1768>: ud2
0x55555562df2a <+1770>: movl $0x4, %edi
0x55555562df2f <+1775>: movq %r12, %rsi
0x55555562df32 <+1778>: callq 0x5555555aed99 ; alloc::alloc::handle_alloc_error::h9164725ce4591dac
0x55555562df37 <+1783>: movq %rax, %rbx
0x55555562df3a <+1786>: cmpq $0x0, 0x20(%rsp)
0x55555562df40 <+1792>: jne 0x55555562df4d ; <+1805>
0x55555562df42 <+1794>: movq $0x0, 0x20(%rsp)
0x55555562df4b <+1803>: jmp 0x55555562df71 ; <+1841>
0x55555562df4d <+1805>: movq 0x18(%rsp), %rdi
0x55555562df52 <+1810>: callq *0x46cde8(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464
0x55555562df58 <+1816>: jmp 0x55555562df71 ; <+1841>
0x55555562df5a <+1818>: movq %rax, %rbx
0x55555562df5d <+1821>: movb $0x1, %bpl
0x55555562df60 <+1824>: jmp 0x55555562df78 ; <+1848>
0x55555562df62 <+1826>: movq %rax, %rbx
0x55555562df65 <+1829>: movq 0x18(%rsp), %rdi
0x55555562df6a <+1834>: jmp 0x55555562df92 ; <+1874>
0x55555562df6c <+1836>: jmp 0x55555562df6e ; <+1838>
0x55555562df6e <+1838>: movq %rax, %rbx
0x55555562df71 <+1841>: movq 0x28(%rsp), %r15
0x55555562df76 <+1846>: xorl %ebp, %ebp
0x55555562df78 <+1848>: testq %r15, %r15
0x55555562df7b <+1851>: je 0x55555562df88 ; <+1864>
0x55555562df7d <+1853>: movq 0x30(%rsp), %rdi
0x55555562df82 <+1858>: callq *0x46cdb8(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464
0x55555562df88 <+1864>: testb %bpl, %bpl
0x55555562df8b <+1867>: movq 0x18(%rsp), %rdi
0x55555562df90 <+1872>: je 0x55555562df9a ; <+1882>
0x55555562df92 <+1874>: cmpq $0x0, 0x20(%rsp)
0x55555562df98 <+1880>: jne 0x55555562dfa2 ; <+1890>
0x55555562df9a <+1882>: movq %rbx, %rdi
0x55555562df9d <+1885>: callq 0x5555555543b0 ; symbol stub for: _Unwind_Resume
0x55555562dfa2 <+1890>: callq *0x46cd98(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464
0x55555562dfa8 <+1896>: movq %rbx, %rdi
0x55555562dfab <+1899>: callq 0x5555555543b0 ; symbol stub for: _Unwind_Resume
I am not sure, but I think that compiler was able to autovectorize this to process two(?) mul(a, b)
calls per iteration, see code starting at 0x55555562dc70
. I do not have time for this right now, but will be able to return to this later today or at the beginning of the week.
In general, I think that it should be easy to modify existing macros to use iter_batched()
and iter_batched_ref()
. I really do not want to do this manually for the rest of the code, but may try to task LLM with this 😁.
Ah, and also this poses another question: do we want to always generate random values each time for both arguments of binary operations to simulate a worst case scenario? Or we also need to add another macro that generates a lot of random values for a second argument but uses a reference to single first argument (self
)? In some cases in practice you need to multiply like a ton of vectors by a single matrix. And there might be a difference in performance due to a cache misses, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also checked the implementation of iter_batched*()
and it seems that it does the right thing and the return value of a setup closure is wrapped in a black_box()
.
Regarding the Criterion issue you mentioned: I am not sure, but I suspect that the problem is that they are measuring the time it takes to deallocate a vector on drop()
. Performance of free()
may depend on an allocated size because allocators sometimes use different algorithms for different allocation sizes.
And a last thought for now: can we bump criterion
to a latest version as a part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And a last thought for now: can we bump criterion to a latest version as a part of this PR?
I think bumping to the last criterion version is a very good idea, provided everything else still works.
Ah, and also this poses another question: do we want to always generate random values each time for both arguments of binary operations to simulate a worst case scenario? Or we also need to add another macro that generates a lot of random values for a second argument but uses a reference to single first argument (self)? In some cases in practice you need to multiply like a ton of vectors by a single matrix. And there might be a difference in performance due to a cache misses, for example.
Very good question and I don't think I have a great answer. Maybe your provided implementation is better after all? It corrects the original code but keeps the same spirit, i.e. if we have sufficiently small pieces of data, we'll take advantage of caching... I don't know... microbenchmarks sure are great aren't they 😆
In general, I think that it should be easy to modify existing macros to use iter_batched() and iter_batched_ref(). I really do not want to do this manually for the rest of the code, but may try to task LLM with this 😁.
Given the discussion above, I don't know if you would want to try the refactor at all. If you end up attempting it and it is too tedious for you to refactor (or you don't have access to one of our future AI overloads), let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redid almost everything in this PR. Here is the summary of all changes relative to current main
:
-
Set
codegen-units = 1
for benchmarks. I found thatcodegen-units
with default value leads to inconsistent results across recompilations (clean vs. incremental). Also, sometimes it leads to a significant performance degradation of benchmarks unrelated to code changes. See 4000% performance regression with "-C target-cpu=x86-64-v3" and fat LTO rust-lang/rust#146497 for details. -
criterion
updated to version 0.7. -
Unused macros removed (I found another unused macro!)
-
Remaining macros changed to use
iter_batched()
anditer_batched_ref()
. -
Added macros to benchmark Single x N Values binary operations. This simulates real-world use cases like multiplication of many vectors by a single matrix.
There is a ~2x performance difference between a case when both arguments are random on each iteration and a case when one argument is static and second is random on each iteration:
click for details...
mat2_mul_v time: [778.33 ps 785.41 ps 797.70 ps] Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) low severe 4 (4.00%) high mild 5 (5.00%) high severe mat3_mul_v time: [1.7001 ns 1.7051 ns 1.7111 ns] Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low severe 1 (1.00%) low mild 8 (8.00%) high mild 1 (1.00%) high severe mat4_mul_v time: [2.6101 ns 2.6223 ns 2.6374 ns] Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe single_mat2_mul_v time: [402.65 ps 403.62 ps 404.75 ps] Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low mild 5 (5.00%) high mild 3 (3.00%) high severe single_mat3_mul_v time: [651.30 ps 654.06 ps 657.15 ps] Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) low mild 8 (8.00%) high mild 4 (4.00%) high severe single_mat4_mul_v time: [1.0628 ns 1.0645 ns 1.0666 ns] Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat2_tr_mul_v time: [719.81 ps 721.99 ps 724.59 ps] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 5 (5.00%) high mild mat3_tr_mul_v time: [1.6685 ns 1.6758 ns 1.6841 ns] Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat4_tr_mul_v time: [2.6739 ns 2.6897 ns 2.7080 ns] Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 8 (8.00%) high severe single_mat2_tr_mul_v time: [353.36 ps 354.56 ps 356.03 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe single_mat3_tr_mul_v time: [779.82 ps 782.84 ps 786.37 ps] Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild 2 (2.00%) high severe single_mat4_tr_mul_v time: [1.1918 ns 1.1946 ns 1.1977 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe unit_quaternion_mul_v time: [1.5002 ns 1.5088 ns 1.5183 ns] change: [−0.0578% +0.3775% +0.8498%] (p = 0.10 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe single_unit_quaternion_mul_v time: [1.0489 ns 1.0531 ns 1.0584 ns] Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe
-
Uncommented some quaternion benchmarks. I do not know why those benchmarks were commented out in the first place.
-
Remaining non-macro benchmarks changed to use
iter_batched()
anditer_batched_ref()
.The bulk of the changes was done by Claude Sonnet 4. Additionally I moved
DVector
allocations outside of the benchmarks, and added anything allocated and not consumed into a return tuple of a benchmark closure to ensure that implicit drop/free is not included into the measured time. -
Added
reproducible_smatrix()
. Some algorithms may not converge when used on completely random values with the default value of epsilon and unlimited iterations.reproducible_dmatrix()
already exist to circumvent this forDMatrix
, so I implemented the same forSMatrix
.In my tests this problem manifested itself only on
schur_decompose_4x4
, but I decided to apply similar fix for all benchmarks that also usereproducible_dmatrix()
forDMatrix
. -
Cholesky decomposition benchmarks changed to use
reproducible_dmatrix()
.Random matrices may be not positive-definite and Cholesky decomposition benchmarks panic because of that:
Benchmarking cholesky_decompose_unpack_100x100: Warming up for 3.0000 s thread 'main' panicked at benches/linalg/cholesky.rs:38:45: called `Option::unwrap()` on a `None` value
Total run time of full benchmark suite on my machine (AMD 5950X) has not changed and is still around ~30 minutes. Here are the results with difference from current main
:
click for details...
mat2_mul_m time: [1.1043 ns 1.1058 ns 1.1077 ns]
change: [+49.306% +49.651% +50.045%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) low severe
2 (2.00%) high mild
6 (6.00%) high severe
mat3_mul_m time: [3.1885 ns 3.1945 ns 3.2038 ns]
change: [+102.62% +103.63% +104.86%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
mat4_mul_m time: [6.7759 ns 6.7840 ns 6.7929 ns]
change: [+130.65% +131.50% +132.59%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) low severe
3 (3.00%) high mild
4 (4.00%) high severe
mat2_tr_mul_m time: [1.2882 ns 1.2901 ns 1.2926 ns]
change: [+75.005% +75.472% +75.928%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) low severe
1 (1.00%) high mild
3 (3.00%) high severe
mat3_tr_mul_m time: [3.1688 ns 3.1725 ns 3.1770 ns]
change: [+101.61% +102.10% +102.66%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low severe
4 (4.00%) high mild
4 (4.00%) high severe
mat4_tr_mul_m time: [6.5406 ns 6.5453 ns 6.5508 ns]
change: [+121.95% +122.66% +123.42%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
3 (3.00%) low severe
1 (1.00%) low mild
5 (5.00%) high mild
6 (6.00%) high severe
mat2_add_m time: [644.68 ps 645.88 ps 647.24 ps]
change: [−13.049% −12.530% −11.972%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
mat3_add_m time: [1.3543 ns 1.3572 ns 1.3607 ns]
change: [−14.707% −13.705% −12.403%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
6 (6.00%) low severe
5 (5.00%) high mild
4 (4.00%) high severe
mat4_add_m time: [2.3987 ns 2.4015 ns 2.4044 ns]
change: [−20.676% −19.615% −18.453%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
6 (6.00%) low severe
5 (5.00%) high mild
3 (3.00%) high severe
mat2_sub_m time: [637.47 ps 638.88 ps 640.62 ps]
change: [−13.604% −13.020% −12.333%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe
mat3_sub_m time: [1.3531 ns 1.3546 ns 1.3562 ns]
change: [−15.139% −14.610% −14.084%] (p = 0.00 < 0.05)
Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) low severe
1 (1.00%) low mild
6 (6.00%) high mild
4 (4.00%) high severe
mat4_sub_m time: [2.3972 ns 2.3996 ns 2.4021 ns]
change: [−20.412% −19.249% −18.330%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) low severe
1 (1.00%) high mild
3 (3.00%) high severe
mat2_mul_v time: [774.43 ps 775.48 ps 776.73 ps]
change: [+144.90% +145.51% +146.12%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low severe
5 (5.00%) high mild
3 (3.00%) high severe
mat3_mul_v time: [1.6843 ns 1.6858 ns 1.6874 ns]
change: [+284.57% +285.82% +287.43%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) low severe
1 (1.00%) high mild
3 (3.00%) high severe
mat4_mul_v time: [2.6029 ns 2.6196 ns 2.6485 ns]
change: [+255.34% +257.62% +261.68%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe
single_mat2_mul_v time: [392.29 ps 393.45 ps 394.87 ps]
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
single_mat3_mul_v time: [650.16 ps 651.47 ps 653.07 ps]
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low severe
3 (3.00%) high mild
4 (4.00%) high severe
single_mat4_mul_v time: [1.0665 ns 1.0690 ns 1.0722 ns]
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low mild
4 (4.00%) high mild
4 (4.00%) high severe
mat2_tr_mul_v time: [719.95 ps 720.92 ps 722.16 ps]
change: [+127.86% +128.34% +128.98%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low severe
2 (2.00%) low mild
7 (7.00%) high mild
4 (4.00%) high severe
mat3_tr_mul_v time: [1.6551 ns 1.6564 ns 1.6577 ns]
change: [+277.57% +278.32% +279.16%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low severe
1 (1.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe
mat4_tr_mul_v time: [2.6477 ns 2.6546 ns 2.6666 ns]
change: [+259.47% +260.55% +261.67%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) low severe
3 (3.00%) high mild
3 (3.00%) high severe
single_mat2_tr_mul_v time: [353.60 ps 355.50 ps 358.48 ps]
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) low mild
4 (4.00%) high mild
3 (3.00%) high severe
single_mat3_tr_mul_v time: [778.13 ps 779.43 ps 781.25 ps]
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low severe
3 (3.00%) high mild
5 (5.00%) high severe
single_mat4_tr_mul_v time: [1.1887 ns 1.1906 ns 1.1930 ns]
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
mat2_mul_s time: [774.44 ps 775.33 ps 776.37 ps]
change: [+6.0947% +6.3308% +6.5936%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
4 (4.00%) high severe
mat3_mul_s time: [962.59 ps 964.98 ps 967.43 ps]
change: [−38.097% −37.694% −37.145%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
3 (3.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
mat4_mul_s time: [1.6589 ns 1.6640 ns 1.6684 ns]
change: [−43.668% −43.130% −42.518%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
8 (8.00%) low severe
3 (3.00%) low mild
1 (1.00%) high mild
6 (6.00%) high severe
mat2_div_s time: [803.09 ps 804.70 ps 806.56 ps]
change: [+10.272% +10.596% +10.960%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
mat3_div_s time: [2.4929 ns 2.4947 ns 2.4967 ns]
change: [+58.793% +59.185% +59.709%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) low severe
5 (5.00%) high mild
4 (4.00%) high severe
mat4_div_s time: [5.1650 ns 5.1688 ns 5.1735 ns]
change: [+76.816% +77.215% +77.629%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
mat2_inv time: [1.1514 ns 1.1523 ns 1.1533 ns]
change: [−41.682% −41.556% −41.439%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) low severe
1 (1.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe
mat3_inv time: [3.3641 ns 3.3707 ns 3.3826 ns]
change: [−37.473% −37.358% −37.214%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
1 (1.00%) low mild
5 (5.00%) high mild
5 (5.00%) high severe
mat4_inv time: [25.970 ns 26.006 ns 26.062 ns]
change: [−9.0865% −8.9013% −8.6986%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
6 (6.00%) high severe
mat2_transpose time: [409.94 ps 410.77 ps 411.75 ps]
change: [−62.889% −62.624% −62.331%] (p = 0.00 < 0.05)
Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
7 (7.00%) high severe
mat3_transpose time: [947.42 ps 953.20 ps 961.97 ps]
change: [−61.273% −60.195% −58.616%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
7 (7.00%) high mild
3 (3.00%) high severe
mat4_transpose time: [1.6510 ns 1.6551 ns 1.6612 ns]
change: [−65.877% −65.592% −65.225%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe
mat_div_scalar time: [480.25 µs 480.55 µs 480.99 µs]
change: [−22.235% −22.169% −22.095%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
mat100_add_mat100 time: [3.0426 µs 3.0910 µs 3.1351 µs]
change: [+81.145% +84.392% +88.112%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
2 (2.00%) low severe
3 (3.00%) low mild
7 (7.00%) high mild
1 (1.00%) high severe
mat4_mul_mat4 time: [36.836 ns 36.859 ns 36.886 ns]
change: [+24.966% +25.568% +26.171%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) low severe
4 (4.00%) high mild
2 (2.00%) high severe
mat5_mul_mat5 time: [56.715 ns 56.876 ns 57.015 ns]
change: [+10.239% +10.666% +11.091%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
1 (1.00%) low mild
6 (6.00%) high mild
mat6_mul_mat6 time: [83.817 ns 83.999 ns 84.156 ns]
change: [+10.675% +10.890% +11.065%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
mat7_mul_mat7 time: [93.211 ns 93.386 ns 93.534 ns]
change: [+10.654% +10.892% +11.129%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low severe
2 (2.00%) low mild
mat8_mul_mat8 time: [88.919 ns 89.410 ns 89.884 ns]
change: [+22.808% +23.376% +23.888%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
mat9_mul_mat9 time: [207.12 ns 209.04 ns 211.17 ns]
change: [+14.053% +14.646% +15.258%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
9 (9.00%) low mild
1 (1.00%) high mild
mat10_mul_mat10 time: [236.75 ns 237.11 ns 237.47 ns]
change: [+20.055% +20.366% +20.651%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) low severe
7 (7.00%) low mild
1 (1.00%) high mild
mat10_mul_mat10_static time: [116.68 ns 117.15 ns 117.62 ns]
change: [+11.160% +11.617% +12.049%] (p = 0.00 < 0.05)
Performance has regressed.
mat100_mul_mat100 time: [40.188 µs 40.327 µs 40.459 µs]
change: [+3.2490% +3.4765% +3.7130%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
7 (7.00%) high mild
8 (8.00%) high severe
mat500_mul_mat500 time: [4.3909 ms 4.3944 ms 4.3978 ms]
change: [+0.8556% +0.9519% +1.0448%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) low severe
2 (2.00%) high mild
1 (1.00%) high severe
iter time: [840.01 µs 840.39 µs 840.81 µs]
change: [+10.527% +10.726% +10.915%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
2 (2.00%) high mild
11 (11.00%) high severe
iter_rev time: [210.14 µs 211.10 µs 212.84 µs]
change: [+0.2455% +0.7119% +1.7846%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe
copy_from time: [199.77 µs 200.80 µs 202.55 µs]
change: [+41.195% +41.962% +43.287%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
8 (8.00%) low mild
1 (1.00%) high severe
axpy time: [31.301 µs 33.301 µs 34.957 µs]
change: [+40.726% +52.001% +63.112%] (p = 0.00 < 0.05)
Performance has regressed.
tr_mul_to time: [126.46 µs 127.12 µs 128.09 µs]
change: [−4.0124% −3.5145% −2.7708%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
mat_mul_mat time: [39.252 µs 39.443 µs 39.626 µs]
change: [−0.7084% −0.3800% −0.0130%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
8 (8.00%) high mild
2 (2.00%) high severe
mat100_from_fn time: [6.8398 µs 6.8418 µs 6.8446 µs]
change: [+519.35% +522.43% +524.76%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) high mild
9 (9.00%) high severe
mat500_from_fn time: [172.11 µs 172.14 µs 172.18 µs]
change: [+498.70% +499.32% +499.93%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low mild
5 (5.00%) high mild
7 (7.00%) high severe
vec2_add_v_f32 time: [303.98 ps 304.76 ps 305.65 ps]
change: [−5.1499% −4.3536% −3.5996%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
4 (4.00%) low severe
5 (5.00%) high mild
6 (6.00%) high severe
vec3_add_v_f32 time: [586.36 ps 587.93 ps 589.92 ps]
change: [+34.275% +34.886% +35.631%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
5 (5.00%) high mild
6 (6.00%) high severe
vec4_add_v_f32 time: [603.45 ps 604.44 ps 605.59 ps]
change: [−18.949% −18.215% −17.623%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe
vec2_add_v_f64 time: [602.08 ps 602.83 ps 603.64 ps]
change: [+89.139% +90.573% +91.808%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
5 (5.00%) high severe
vec3_add_v_f64 time: [910.94 ps 912.60 ps 914.56 ps]
change: [+107.10% +108.18% +109.41%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) low severe
6 (6.00%) high mild
3 (3.00%) high severe
vec4_add_v_f64 time: [1.1894 ns 1.1933 ns 1.1963 ns]
change: [+82.607% +85.023% +86.911%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
9 (9.00%) low severe
2 (2.00%) low mild
2 (2.00%) high severe
vec2_sub_v time: [303.45 ps 304.42 ps 305.37 ps]
change: [−5.3598% −4.4578% −3.6738%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
8 (8.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
vec3_sub_v time: [672.95 ps 674.82 ps 676.51 ps]
change: [+51.463% +52.336% +53.346%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
vec4_sub_v time: [602.84 ps 604.65 ps 607.70 ps]
change: [−19.744% −18.754% −17.881%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
6 (6.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
vec2_mul_s time: [666.49 ps 667.29 ps 668.31 ps]
change: [+111.37% +111.81% +112.32%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
4 (4.00%) low severe
6 (6.00%) high mild
6 (6.00%) high severe
vec3_mul_s time: [511.42 ps 513.44 ps 515.86 ps]
change: [+15.556% +16.273% +17.049%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
vec4_mul_s time: [774.13 ps 775.22 ps 776.52 ps]
change: [+5.1602% +5.5545% +6.0225%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
7 (7.00%) high severe
vec2_div_s time: [1.3658 ns 1.3694 ns 1.3726 ns]
change: [+328.67% +329.83% +331.09%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
vec3_div_s time: [607.73 ps 608.63 ps 609.66 ps]
change: [+37.642% +38.017% +38.440%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) low severe
8 (8.00%) high mild
6 (6.00%) high severe
vec4_div_s time: [802.59 ps 803.62 ps 804.82 ps]
change: [+8.9451% +9.3240% +9.7149%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) low severe
6 (6.00%) high mild
2 (2.00%) high severe
vec2_dot_f32 time: [461.20 ps 461.73 ps 462.30 ps]
change: [+117.88% +119.27% +120.79%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
9 (9.00%) high severe
vec3_dot_f32 time: [688.24 ps 689.05 ps 689.95 ps]
change: [+225.49% +227.19% +229.16%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
vec4_dot_f32 time: [917.20 ps 921.23 ps 928.57 ps]
change: [+338.59% +341.30% +344.17%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
8 (8.00%) high mild
5 (5.00%) high severe
vec2_dot_f64 time: [596.11 ps 597.51 ps 598.79 ps]
change: [+177.79% +179.60% +182.13%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
vec3_dot_f64 time: [749.32 ps 751.02 ps 752.81 ps]
change: [+253.48% +257.12% +262.11%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) high mild
7 (7.00%) high severe
vec4_dot_f64 time: [1.0145 ns 1.0185 ns 1.0230 ns]
change: [+376.34% +379.47% +383.46%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
vec3_cross time: [971.01 ps 971.87 ps 972.73 ps]
change: [+122.34% +122.74% +123.17%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe
vec2_norm time: [1.0612 ns 1.0623 ns 1.0637 ns]
change: [−0.0722% +0.0499% +0.1765%] (p = 0.44 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) low mild
2 (2.00%) high severe
vec3_norm time: [1.0649 ns 1.0665 ns 1.0694 ns]
change: [−4.3787% −4.1856% −3.8679%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
vec4_norm time: [1.0733 ns 1.0739 ns 1.0746 ns]
change: [−4.5616% −3.9738% −2.9157%] (p = 0.00 < 0.05)
Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
2 (2.00%) low severe
7 (7.00%) low mild
5 (5.00%) high mild
5 (5.00%) high severe
vec2_normalize time: [2.5310 ns 2.5326 ns 2.5345 ns]
change: [+3.5769% +3.6696% +3.7678%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
vec3_normalize time: [2.5389 ns 2.5409 ns 2.5424 ns]
change: [+1.1411% +1.2860% +1.4910%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
vec4_normalize time: [1.8154 ns 1.8164 ns 1.8173 ns]
change: [−1.1191% −0.9926% −0.8485%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) low severe
1 (1.00%) low mild
1 (1.00%) high mild
3 (3.00%) high severe
vec10000_dot_f64 time: [2.0296 µs 2.0337 µs 2.0383 µs]
change: [+71.107% +72.619% +74.228%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) low severe
3 (3.00%) high mild
4 (4.00%) high severe
vec10000_dot_f32 time: [1.1891 µs 1.1926 µs 1.1962 µs]
change: [+6.3585% +7.1059% +7.9357%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
6 (6.00%) high severe
vec10000_axpy_f64 time: [2.0702 µs 2.0739 µs 2.0777 µs]
change: [+39.373% +40.227% +41.210%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
vec10000_axpy_beta_f64 time: [2.0914 µs 2.0962 µs 2.1012 µs]
change: [+31.958% +32.843% +33.467%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) low severe
5 (5.00%) high mild
2 (2.00%) high severe
vec10000_axpy_f64_slice time: [2.0272 µs 2.0303 µs 2.0335 µs]
change: [+35.880% +36.621% +37.307%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) low severe
2 (2.00%) high mild
1 (1.00%) high severe
vec10000_axpy_f64_static
time: [13.917 µs 13.965 µs 14.005 µs]
change: [+859.61% +869.73% +879.35%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low severe
3 (3.00%) high mild
2 (2.00%) high severe
vec10000_axpy_f32 time: [1.0402 µs 1.0421 µs 1.0437 µs]
change: [+38.710% +39.603% +40.363%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
vec10000_axpy_beta_f32 time: [1.0329 µs 1.0346 µs 1.0364 µs]
change: [+30.705% +31.490% +32.040%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
quaternion_add_q time: [642.58 ps 650.39 ps 662.45 ps]
change: [−11.788% −10.934% −9.9463%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
6 (6.00%) high severe
quaternion_sub_q time: [641.16 ps 643.22 ps 645.88 ps]
change: [−12.654% −11.822% −10.943%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
5 (5.00%) low severe
1 (1.00%) low mild
5 (5.00%) high mild
4 (4.00%) high severe
quaternion_mul_q time: [1.4252 ns 1.4271 ns 1.4294 ns]
change: [+94.545% +95.022% +95.499%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
unit_quaternion_mul_v time: [1.4859 ns 1.4874 ns 1.4890 ns]
change: [+242.77% +243.56% +244.31%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
single_unit_quaternion_mul_v
time: [1.0422 ns 1.0457 ns 1.0504 ns]
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low severe
4 (4.00%) high mild
4 (4.00%) high severe
quaternion_mul_s time: [771.17 ps 772.18 ps 773.37 ps]
change: [+6.1278% +6.4276% +6.7583%] (p = 0.00 < 0.05)
Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
quaternion_div_s time: [798.54 ps 799.82 ps 801.43 ps]
change: [+9.2123% +9.7287% +10.338%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
2 (2.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
quaternion_inv time: [1.2401 ns 1.2408 ns 1.2417 ns]
change: [−43.660% −43.521% −43.317%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
2 (2.00%) low severe
5 (5.00%) high mild
6 (6.00%) high severe
unit_quaternion_inv time: [596.01 ps 598.93 ps 602.66 ps]
change: [−49.707% −49.184% −48.445%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
6 (6.00%) high mild
9 (9.00%) high severe
quaternion_conjugate time: [604.36 ps 608.60 ps 613.48 ps]
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
quaternion_normalize time: [1.8268 ns 1.8274 ns 1.8281 ns]
Found 18 outliers among 100 measurements (18.00%)
4 (4.00%) low severe
4 (4.00%) low mild
7 (7.00%) high mild
3 (3.00%) high severe
bidiagonalize_100x100 time: [265.91 µs 266.00 µs 266.11 µs]
change: [+0.7553% +0.8363% +0.9114%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
bidiagonalize_100x500 time: [2.0053 ms 2.0060 ms 2.0065 ms]
change: [+4.0325% +4.2372% +4.3938%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) low severe
2 (2.00%) high mild
5 (5.00%) high severe
bidiagonalize_4x4 time: [266.92 ns 267.24 ns 267.62 ns]
change: [+7.1063% +7.2057% +7.3231%] (p = 0.00 < 0.05)
Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
1 (1.00%) low severe
5 (5.00%) low mild
13 (13.00%) high mild
4 (4.00%) high severe
Benchmarking bidiagonalize_500x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
bidiagonalize_500x100 time: [1.6781 ms 1.6793 ms 1.6804 ms]
change: [+1.3944% +1.5312% +1.6400%] (p = 0.00 < 0.05)
Performance has regressed.
bidiagonalize_unpack_100x100
time: [522.13 µs 522.36 µs 522.63 µs]
change: [−0.5318% −0.4044% −0.2627%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
4 (4.00%) high mild
7 (7.00%) high severe
bidiagonalize_unpack_100x500
time: [2.9858 ms 2.9916 ms 2.9976 ms]
change: [−0.7824% −0.3995% −0.0370%] (p = 0.04 < 0.05)
Change within noise threshold.
bidiagonalize_unpack_500x100
time: [2.5884 ms 2.5896 ms 2.5910 ms]
change: [+0.0767% +0.1539% +0.2316%] (p = 0.00 < 0.05)
Change within noise threshold.
cholesky_100x100 time: [31.084 µs 31.101 µs 31.122 µs]
change: [−5.0365% −4.7949% −4.4205%] (p = 0.00 < 0.05)
Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
2 (2.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild
9 (9.00%) high severe
cholesky_500x500 time: [4.4799 ms 4.4849 ms 4.4903 ms]
change: [−0.5985% −0.3685% −0.1374%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
cholesky_decompose_unpack_100x100
time: [31.659 µs 31.685 µs 31.727 µs]
change: [−4.9712% −4.7445% −4.3325%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
4 (4.00%) low severe
4 (4.00%) low mild
2 (2.00%) high mild
5 (5.00%) high severe
cholesky_decompose_unpack_500x500
time: [4.4795 ms 4.4845 ms 4.4910 ms]
change: [−1.9595% −1.7121% −1.4978%] (p = 0.00 < 0.05)
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
7 (7.00%) high severe
cholesky_solve_10x10 time: [170.70 ns 170.76 ns 170.82 ns]
change: [+8.0936% +8.1777% +8.2764%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe
cholesky_solve_100x100 time: [2.9071 µs 2.9117 µs 2.9174 µs]
change: [+8.4770% +8.9956% +9.6254%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
3 (3.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
cholesky_solve_500x500 time: [54.193 µs 54.303 µs 54.417 µs]
change: [+3.9332% +4.1755% +4.4477%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
cholesky_inverse_10x10 time: [1.3189 µs 1.3195 µs 1.3201 µs]
change: [+2.5360% +2.6238% +2.7131%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
cholesky_inverse_100x100
time: [270.85 µs 270.88 µs 270.92 µs]
change: [−0.9726% −0.8524% −0.7319%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low severe
4 (4.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
cholesky_inverse_500x500
time: [26.673 ms 26.694 ms 26.714 ms]
change: [+1.0784% +1.1816% +1.2794%] (p = 0.00 < 0.05)
Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
19 (19.00%) low severe
2 (2.00%) low mild
2 (2.00%) high severe
full_piv_lu_decompose_10x10
time: [582.31 ns 582.48 ns 582.67 ns]
change: [+19.583% +19.702% +19.795%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low severe
6 (6.00%) high mild
2 (2.00%) high severe
full_piv_lu_decompose_100x100
time: [218.73 µs 218.78 µs 218.84 µs]
change: [+5.8729% +5.9828% +6.0904%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low severe
5 (5.00%) low mild
1 (1.00%) high severe
full_piv_lu_solve_10x10 time: [124.88 ns 124.94 ns 125.02 ns]
change: [+7.4724% +7.6252% +7.7787%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) low severe
6 (6.00%) high mild
4 (4.00%) high severe
full_piv_lu_solve_100x100
time: [2.5202 µs 2.5244 µs 2.5289 µs]
change: [+11.226% +11.847% +12.518%] (p = 0.00 < 0.05)
Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
14 (14.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
full_piv_lu_inverse_10x10
time: [869.61 ns 870.27 ns 871.19 ns]
change: [+4.7996% +4.9224% +5.0608%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low severe
1 (1.00%) high mild
4 (4.00%) high severe
full_piv_lu_inverse_100x100
time: [212.68 µs 212.83 µs 213.05 µs]
change: [−0.2835% −0.0351% +0.1310%] (p = 0.80 > 0.05)
No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low severe
4 (4.00%) low mild
3 (3.00%) high mild
5 (5.00%) high severe
full_piv_lu_determinant_10x10
time: [15.320 ns 15.338 ns 15.357 ns]
change: [+410.70% +421.41% +430.41%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
9 (9.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
full_piv_lu_determinant_100x100
time: [137.44 ns 139.37 ns 141.00 ns]
change: [+213.54% +227.75% +241.42%] (p = 0.00 < 0.05)
Performance has regressed.
hessenberg_decompose_4x4
time: [82.510 ns 82.538 ns 82.564 ns]
change: [−27.950% −27.887% −27.830%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
hessenberg_decompose_100x100
time: [295.98 µs 296.16 µs 296.44 µs]
change: [+3.3234% +3.5705% +3.7986%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
hessenberg_decompose_200x200
time: [2.2647 ms 2.2681 ms 2.2714 ms]
change: [+4.8426% +4.9983% +5.1646%] (p = 0.00 < 0.05)
Performance has regressed.
hessenberg_decompose_unpack_100x100
time: [435.30 µs 435.75 µs 436.12 µs]
change: [+2.7479% +2.8420% +2.9424%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
hessenberg_decompose_unpack_200x200
time: [3.2667 ms 3.2678 ms 3.2690 ms]
change: [+3.9624% +4.0021% +4.0423%] (p = 0.00 < 0.05)
Performance has regressed.
Found 22 outliers among 100 measurements (22.00%)
13 (13.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
5 (5.00%) high severe
lu_decompose_10x10 time: [353.04 ns 353.16 ns 353.31 ns]
change: [−5.0408% −4.9435% −4.8487%] (p = 0.00 < 0.05)
Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
4 (4.00%) low severe
4 (4.00%) low mild
6 (6.00%) high mild
5 (5.00%) high severe
lu_decompose_100x100 time: [71.544 µs 71.560 µs 71.579 µs]
change: [−1.7176% −1.6430% −1.5721%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
3 (3.00%) high severe
lu_solve_10x10 time: [115.42 ns 115.52 ns 115.61 ns]
change: [+3.9363% +4.1024% +4.2557%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
4 (4.00%) low severe
8 (8.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
lu_solve_100x100 time: [2.5152 µs 2.5190 µs 2.5225 µs]
change: [+15.120% +15.625% +16.088%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
lu_inverse_10x10 time: [902.55 ns 903.32 ns 903.97 ns]
change: [+0.7407% +0.8734% +1.0263%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe
lu_inverse_100x100 time: [216.21 µs 216.47 µs 216.80 µs]
change: [−0.6663% −0.5584% −0.4316%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 18 outliers among 100 measurements (18.00%)
2 (2.00%) low severe
4 (4.00%) low mild
5 (5.00%) high mild
7 (7.00%) high severe
lu_determinant_10x10 time: [13.394 ns 13.481 ns 13.665 ns]
change: [+508.98% +524.96% +543.53%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
6 (6.00%) low severe
1 (1.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe
lu_determinant_100x100 time: [149.12 ns 150.16 ns 151.08 ns]
change: [+265.69% +281.86% +296.23%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
10 (10.00%) low severe
4 (4.00%) low mild
qr_decompose_100x100 time: [141.62 µs 141.65 µs 141.69 µs]
change: [+0.6391% +0.8447% +0.9784%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) low mild
1 (1.00%) high mild
3 (3.00%) high severe
Benchmarking qr_decompose_100x500: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
qr_decompose_100x500 time: [1.0071 ms 1.0082 ms 1.0097 ms]
change: [+0.9031% +1.2358% +1.6126%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
12 (12.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
qr_decompose_4x4 time: [100.40 ns 100.43 ns 100.45 ns]
change: [−19.315% −19.268% −19.224%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
1 (1.00%) high mild
4 (4.00%) high severe
Benchmarking qr_decompose_500x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60.
qr_decompose_500x100 time: [847.17 µs 847.68 µs 848.21 µs]
change: [+2.1441% +2.3425% +2.5069%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) high mild
3 (3.00%) high severe
qr_decompose_unpack_100x100
time: [283.22 µs 283.26 µs 283.30 µs]
change: [−0.3591% −0.2383% −0.1147%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 23 outliers among 100 measurements (23.00%)
21 (21.00%) low severe
1 (1.00%) low mild
1 (1.00%) high severe
Benchmarking qr_decompose_unpack_100x500: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60.
qr_decompose_unpack_100x500
time: [1.1399 ms 1.1429 ms 1.1457 ms]
change: [−1.9555% −1.8085% −1.6312%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Benchmarking qr_decompose_unpack_500x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.6s, enable flat sampling, or reduce sample count to 50.
qr_decompose_unpack_500x100
time: [1.6633 ms 1.6640 ms 1.6648 ms]
change: [+1.4516% +1.5245% +1.5969%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low severe
5 (5.00%) low mild
4 (4.00%) high severe
qr_solve_10x10 time: [156.51 ns 156.56 ns 156.61 ns]
change: [+3.7415% +3.8709% +3.9947%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) low severe
5 (5.00%) low mild
1 (1.00%) high mild
qr_solve_100x100 time: [3.5393 µs 3.5454 µs 3.5511 µs]
change: [+6.0908% +6.5747% +6.9798%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) low mild
qr_inverse_10x10 time: [806.75 ns 807.99 ns 809.61 ns]
change: [+0.6973% +0.8242% +0.9558%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
qr_inverse_100x100 time: [330.65 µs 330.74 µs 330.85 µs]
change: [+1.2238% +1.3244% +1.4518%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) low mild
4 (4.00%) high mild
5 (5.00%) high severe
schur_decompose_4x4 time: [969.14 ns 969.71 ns 970.18 ns]
change: [−12.293% −12.223% −12.149%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
3 (3.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
schur_decompose_10x10 time: [7.3226 µs 7.3237 µs 7.3247 µs]
change: [+0.3785% +0.4095% +0.4394%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
2 (2.00%) low mild
4 (4.00%) high mild
3 (3.00%) high severe
schur_decompose_100x100 time: [2.5760 ms 2.5763 ms 2.5768 ms]
change: [+0.7992% +0.8504% +0.8935%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
schur_decompose_200x200 time: [18.285 ms 18.296 ms 18.308 ms]
change: [+1.9360% +2.0941% +2.2427%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe
eigenvalues_4x4 time: [937.94 ns 938.15 ns 938.38 ns]
change: [+25.764% +25.898% +26.023%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
eigenvalues_10x10 time: [5.9066 µs 5.9088 µs 5.9117 µs]
change: [+0.1208% +0.1938% +0.2740%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe
Benchmarking eigenvalues_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50.
eigenvalues_100x100 time: [1.5870 ms 1.5873 ms 1.5876 ms]
change: [−0.8569% −0.8247% −0.7914%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
eigenvalues_200x200 time: [11.081 ms 11.088 ms 11.102 ms]
change: [+0.0054% +0.2956% +0.4946%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
solve_l_triangular_100x100
time: [1.3250 µs 1.3651 µs 1.4012 µs]
change: [+22.932% +24.999% +27.087%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
10 (10.00%) high mild
2 (2.00%) high severe
solve_l_triangular_1000x1000
time: [101.52 µs 102.04 µs 102.85 µs]
change: [+1.5784% +2.0953% +2.8471%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
9 (9.00%) high mild
6 (6.00%) high severe
tr_solve_l_triangular_100x100
time: [2.0144 µs 2.0537 µs 2.0902 µs]
change: [+13.600% +14.669% +15.998%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
tr_solve_l_triangular_1000x1000
time: [93.569 µs 94.056 µs 94.857 µs]
change: [+1.2474% +1.7955% +2.5979%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
solve_u_triangular_100x100
time: [1.5878 µs 1.6615 µs 1.7405 µs]
change: [+31.200% +34.370% +38.132%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
10 (10.00%) high mild
3 (3.00%) high severe
solve_u_triangular_1000x1000
time: [105.07 µs 105.46 µs 106.12 µs]
change: [+6.6559% +7.0936% +7.8401%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
tr_solve_u_triangular_100x100
time: [1.4369 µs 1.4697 µs 1.4986 µs]
change: [+17.195% +18.687% +20.307%] (p = 0.00 < 0.05)
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
11 (11.00%) high mild
2 (2.00%) high severe
tr_solve_u_triangular_1000x1000
time: [88.868 µs 89.303 µs 90.014 µs]
change: [+4.2489% +4.7933% +5.6045%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
svd_decompose_2x2 time: [22.913 ns 22.958 ns 23.017 ns]
change: [+9.3648% +9.7443% +10.253%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
svd_decompose_3x3 time: [359.30 ns 359.72 ns 360.20 ns]
change: [+9.0123% +9.1174% +9.2394%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
svd_decompose_4x4 time: [896.28 ns 896.55 ns 896.85 ns]
change: [−7.1192% −7.0496% −6.9853%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low severe
3 (3.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe
svd_decompose_10x10 time: [5.7680 µs 5.7708 µs 5.7739 µs]
change: [+1.1933% +1.4155% +1.6347%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
Benchmarking svd_decompose_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50.
svd_decompose_100x100 time: [1.5704 ms 1.5709 ms 1.5715 ms]
change: [+1.4465% +1.4891% +1.5357%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
svd_decompose_200x200 time: [11.845 ms 11.847 ms 11.850 ms]
change: [+1.4378% +1.4794% +1.5225%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe
rank_4x4 time: [716.49 ns 716.62 ns 716.74 ns]
change: [+4.9084% +4.9678% +5.0237%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
rank_10x10 time: [4.2304 µs 4.2341 µs 4.2377 µs]
change: [+0.4993% +0.6056% +0.7271%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
rank_100x100 time: [522.74 µs 522.85 µs 522.97 µs]
change: [+0.2822% +0.3170% +0.3535%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high severe
rank_200x200 time: [3.0167 ms 3.0217 ms 3.0267 ms]
change: [+0.3924% +0.5333% +0.6946%] (p = 0.00 < 0.05)
Change within noise threshold.
singular_values_4x4 time: [735.97 ns 736.08 ns 736.21 ns]
change: [−7.6736% −7.6163% −7.5596%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low severe
2 (2.00%) low mild
2 (2.00%) high severe
singular_values_10x10 time: [4.2987 µs 4.2997 µs 4.3010 µs]
change: [+1.6193% +1.7215% +1.8186%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
singular_values_100x100 time: [525.20 µs 525.36 µs 525.54 µs]
change: [+0.4054% +0.4526% +0.4982%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
singular_values_200x200 time: [3.0712 ms 3.0729 ms 3.0750 ms]
change: [+2.1769% +2.2358% +2.3112%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
pseudo_inverse_4x4 time: [877.64 ns 878.38 ns 879.12 ns]
change: [−8.2828% −8.2216% −8.1662%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low severe
3 (3.00%) low mild
2 (2.00%) high mild
7 (7.00%) high severe
pseudo_inverse_10x10 time: [6.0008 µs 6.0034 µs 6.0064 µs]
change: [+0.2665% +0.3678% +0.4766%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
Benchmarking pseudo_inverse_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50.
pseudo_inverse_100x100 time: [1.6088 ms 1.6091 ms 1.6094 ms]
change: [+0.1161% +0.2007% +0.2937%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) high mild
10 (10.00%) high severe
pseudo_inverse_200x200 time: [12.038 ms 12.042 ms 12.047 ms]
change: [−0.4351% −0.2531% −0.0699%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 22 outliers among 100 measurements (22.00%)
16 (16.00%) low severe
2 (2.00%) low mild
1 (1.00%) high mild
3 (3.00%) high severe
symmetric_eigen_decompose_4x4
time: [518.00 ns 518.07 ns 518.15 ns]
change: [+4.7008% +4.7492% +4.8006%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
symmetric_eigen_decompose_10x10
time: [3.6417 µs 3.6428 µs 3.6440 µs]
change: [−0.1549% −0.0998% −0.0483%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe
symmetric_eigen_decompose_100x100
time: [761.64 µs 762.66 µs 763.80 µs]
change: [−5.8109% −5.7178% −5.6284%] (p = 0.00 < 0.05)
Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
9 (9.00%) low severe
9 (9.00%) low mild
1 (1.00%) high severe
symmetric_eigen_decompose_200x200
time: [5.1304 ms 5.1337 ms 5.1372 ms]
change: [−9.4434% −9.3646% −9.2959%] (p = 0.00 < 0.05)
Performance has improved.
During benchmarking I found that `codegen-units` with default value leads to inconsistent results across recompilations (clean vs. incremental). Also, sometimes it leads to a significant performance degradation of benchmarks unrelated to code changes. Also see rust-lang/rust#146497
Criterion generates a `Vec` of arguments and passes them through the `black_box()` to guarantee that the benchmark closure is never optimized out of the benchmarking loop. This fixes dimforge#1547 for benchmarks that use `bench_*!()` macros.
This simulates real-world use cases like multiplication of many vectors by a single matrix. There is a ~2x performance difference between a case when both arguments are random on each iteration and a case when one argument is static and second is random on each iteration: mat2_mul_v time: [778.33 ps 785.41 ps 797.70 ps] Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) low severe 4 (4.00%) high mild 5 (5.00%) high severe mat3_mul_v time: [1.7001 ns 1.7051 ns 1.7111 ns] Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low severe 1 (1.00%) low mild 8 (8.00%) high mild 1 (1.00%) high severe mat4_mul_v time: [2.6101 ns 2.6223 ns 2.6374 ns] Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe single_mat2_mul_v time: [402.65 ps 403.62 ps 404.75 ps] Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low mild 5 (5.00%) high mild 3 (3.00%) high severe single_mat3_mul_v time: [651.30 ps 654.06 ps 657.15 ps] Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) low mild 8 (8.00%) high mild 4 (4.00%) high severe single_mat4_mul_v time: [1.0628 ns 1.0645 ns 1.0666 ns] Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat2_tr_mul_v time: [719.81 ps 721.99 ps 724.59 ps] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 5 (5.00%) high mild mat3_tr_mul_v time: [1.6685 ns 1.6758 ns 1.6841 ns] Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat4_tr_mul_v time: [2.6739 ns 2.6897 ns 2.7080 ns] Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 8 (8.00%) high severe single_mat2_tr_mul_v time: [353.36 ps 354.56 ps 356.03 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe single_mat3_tr_mul_v time: [779.82 ps 782.84 ps 786.37 ps] Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild 2 (2.00%) high severe single_mat4_tr_mul_v time: [1.1918 ns 1.1946 ns 1.1977 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe unit_quaternion_mul_v time: [1.5002 ns 1.5088 ns 1.5183 ns] change: [−0.0578% +0.3775% +0.8498%] (p = 0.10 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe single_unit_quaternion_mul_v time: [1.0489 ns 1.0531 ns 1.0584 ns] Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe
I do not know why those benchmarks were commented out.
…hmarks The bulk of the changes was done Claude Sonnet 4. Additionally I moved `DVector` allocations outside of the benchmark, and added anything allocated and not consumed into a return tuple of a benchmark closure to ensure that implicit drop/free is not included into the measured time. This fixes https://github.com/dimforge/nalgebra/issues/1547 for the remaining benchmarks. Benchmark results before vs. after all changes: mat2_mul_m time: [1.1043 ns 1.1058 ns 1.1077 ns] change: [+49.306% +49.651% +50.045%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) low severe 2 (2.00%) high mild 6 (6.00%) high severe mat3_mul_m time: [3.1885 ns 3.1945 ns 3.2038 ns] change: [+102.62% +103.63% +104.86%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe mat4_mul_m time: [6.7759 ns 6.7840 ns 6.7929 ns] change: [+130.65% +131.50% +132.59%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe mat2_tr_mul_m time: [1.2882 ns 1.2901 ns 1.2926 ns] change: [+75.005% +75.472% +75.928%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat3_tr_mul_m time: [3.1688 ns 3.1725 ns 3.1770 ns] change: [+101.61% +102.10% +102.66%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 4 (4.00%) high mild 4 (4.00%) high severe mat4_tr_mul_m time: [6.5406 ns 6.5453 ns 6.5508 ns] change: [+121.95% +122.66% +123.42%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 6 (6.00%) high severe mat2_add_m time: [644.68 ps 645.88 ps 647.24 ps] change: [−13.049% −12.530% −11.972%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe mat3_add_m time: [1.3543 ns 1.3572 ns 1.3607 ns] change: [−14.707% −13.705% −12.403%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 6 (6.00%) low severe 5 (5.00%) high mild 4 (4.00%) high severe mat4_add_m time: [2.3987 ns 2.4015 ns 2.4044 ns] change: [−20.676% −19.615% −18.453%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) low severe 5 (5.00%) high mild 3 (3.00%) high severe mat2_sub_m time: [637.47 ps 638.88 ps 640.62 ps] change: [−13.604% −13.020% −12.333%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe mat3_sub_m time: [1.3531 ns 1.3546 ns 1.3562 ns] change: [−15.139% −14.610% −14.084%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild 4 (4.00%) high severe mat4_sub_m time: [2.3972 ns 2.3996 ns 2.4021 ns] change: [−20.412% −19.249% −18.330%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat2_mul_v time: [774.43 ps 775.48 ps 776.73 ps] change: [+144.90% +145.51% +146.12%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 5 (5.00%) high mild 3 (3.00%) high severe mat3_mul_v time: [1.6843 ns 1.6858 ns 1.6874 ns] change: [+284.57% +285.82% +287.43%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat4_mul_v time: [2.6029 ns 2.6196 ns 2.6485 ns] change: [+255.34% +257.62% +261.68%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe single_mat2_mul_v time: [392.29 ps 393.45 ps 394.87 ps] Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe single_mat3_mul_v time: [650.16 ps 651.47 ps 653.07 ps] Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe single_mat4_mul_v time: [1.0665 ns 1.0690 ns 1.0722 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat2_tr_mul_v time: [719.95 ps 720.92 ps 722.16 ps] change: [+127.86% +128.34% +128.98%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 1 (1.00%) low severe 2 (2.00%) low mild 7 (7.00%) high mild 4 (4.00%) high severe mat3_tr_mul_v time: [1.6551 ns 1.6564 ns 1.6577 ns] change: [+277.57% +278.32% +279.16%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat4_tr_mul_v time: [2.6477 ns 2.6546 ns 2.6666 ns] change: [+259.47% +260.55% +261.67%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) low severe 3 (3.00%) high mild 3 (3.00%) high severe single_mat2_tr_mul_v time: [353.60 ps 355.50 ps 358.48 ps] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 4 (4.00%) high mild 3 (3.00%) high severe single_mat3_tr_mul_v time: [778.13 ps 779.43 ps 781.25 ps] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 3 (3.00%) high mild 5 (5.00%) high severe single_mat4_tr_mul_v time: [1.1887 ns 1.1906 ns 1.1930 ns] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe mat2_mul_s time: [774.44 ps 775.33 ps 776.37 ps] change: [+6.0947% +6.3308% +6.5936%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat3_mul_s time: [962.59 ps 964.98 ps 967.43 ps] change: [−38.097% −37.694% −37.145%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe mat4_mul_s time: [1.6589 ns 1.6640 ns 1.6684 ns] change: [−43.668% −43.130% −42.518%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 8 (8.00%) low severe 3 (3.00%) low mild 1 (1.00%) high mild 6 (6.00%) high severe mat2_div_s time: [803.09 ps 804.70 ps 806.56 ps] change: [+10.272% +10.596% +10.960%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe mat3_div_s time: [2.4929 ns 2.4947 ns 2.4967 ns] change: [+58.793% +59.185% +59.709%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low severe 5 (5.00%) high mild 4 (4.00%) high severe mat4_div_s time: [5.1650 ns 5.1688 ns 5.1735 ns] change: [+76.816% +77.215% +77.629%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe mat2_inv time: [1.1514 ns 1.1523 ns 1.1533 ns] change: [−41.682% −41.556% −41.439%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat3_inv time: [3.3641 ns 3.3707 ns 3.3826 ns] change: [−37.473% −37.358% −37.214%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe mat4_inv time: [25.970 ns 26.006 ns 26.062 ns] change: [−9.0865% −8.9013% −8.6986%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 6 (6.00%) high severe mat2_transpose time: [409.94 ps 410.77 ps 411.75 ps] change: [−62.889% −62.624% −62.331%] (p = 0.00 < 0.05) Performance has improved. Found 17 outliers among 100 measurements (17.00%) 4 (4.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe mat3_transpose time: [947.42 ps 953.20 ps 961.97 ps] change: [−61.273% −60.195% −58.616%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 7 (7.00%) high mild 3 (3.00%) high severe mat4_transpose time: [1.6510 ns 1.6551 ns 1.6612 ns] change: [−65.877% −65.592% −65.225%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe mat_div_scalar time: [480.25 µs 480.55 µs 480.99 µs] change: [−22.235% −22.169% −22.095%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe mat100_add_mat100 time: [3.0426 µs 3.0910 µs 3.1351 µs] change: [+81.145% +84.392% +88.112%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 3 (3.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe mat4_mul_mat4 time: [36.836 ns 36.859 ns 36.886 ns] change: [+24.966% +25.568% +26.171%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) low severe 4 (4.00%) high mild 2 (2.00%) high severe mat5_mul_mat5 time: [56.715 ns 56.876 ns 57.015 ns] change: [+10.239% +10.666% +11.091%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild mat6_mul_mat6 time: [83.817 ns 83.999 ns 84.156 ns] change: [+10.675% +10.890% +11.065%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild mat7_mul_mat7 time: [93.211 ns 93.386 ns 93.534 ns] change: [+10.654% +10.892% +11.129%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 2 (2.00%) low mild mat8_mul_mat8 time: [88.919 ns 89.410 ns 89.884 ns] change: [+22.808% +23.376% +23.888%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild mat9_mul_mat9 time: [207.12 ns 209.04 ns 211.17 ns] change: [+14.053% +14.646% +15.258%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 9 (9.00%) low mild 1 (1.00%) high mild mat10_mul_mat10 time: [236.75 ns 237.11 ns 237.47 ns] change: [+20.055% +20.366% +20.651%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) low severe 7 (7.00%) low mild 1 (1.00%) high mild mat10_mul_mat10_static time: [116.68 ns 117.15 ns 117.62 ns] change: [+11.160% +11.617% +12.049%] (p = 0.00 < 0.05) Performance has regressed. mat100_mul_mat100 time: [40.188 µs 40.327 µs 40.459 µs] change: [+3.2490% +3.4765% +3.7130%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe mat500_mul_mat500 time: [4.3909 ms 4.3944 ms 4.3978 ms] change: [+0.8556% +0.9519% +1.0448%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) low severe 2 (2.00%) high mild 1 (1.00%) high severe iter time: [840.01 µs 840.39 µs 840.81 µs] change: [+10.527% +10.726% +10.915%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) high mild 11 (11.00%) high severe iter_rev time: [210.14 µs 211.10 µs 212.84 µs] change: [+0.2455% +0.7119% +1.7846%] (p = 0.02 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe copy_from time: [199.77 µs 200.80 µs 202.55 µs] change: [+41.195% +41.962% +43.287%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 8 (8.00%) low mild 1 (1.00%) high severe axpy time: [31.301 µs 33.301 µs 34.957 µs] change: [+40.726% +52.001% +63.112%] (p = 0.00 < 0.05) Performance has regressed. tr_mul_to time: [126.46 µs 127.12 µs 128.09 µs] change: [−4.0124% −3.5145% −2.7708%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe mat_mul_mat time: [39.252 µs 39.443 µs 39.626 µs] change: [−0.7084% −0.3800% −0.0130%] (p = 0.02 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 8 (8.00%) high mild 2 (2.00%) high severe mat100_from_fn time: [6.8398 µs 6.8418 µs 6.8446 µs] change: [+519.35% +522.43% +524.76%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) high mild 9 (9.00%) high severe mat500_from_fn time: [172.11 µs 172.14 µs 172.18 µs] change: [+498.70% +499.32% +499.93%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low mild 5 (5.00%) high mild 7 (7.00%) high severe vec2_add_v_f32 time: [303.98 ps 304.76 ps 305.65 ps] change: [−5.1499% −4.3536% −3.5996%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 5 (5.00%) high mild 6 (6.00%) high severe vec3_add_v_f32 time: [586.36 ps 587.93 ps 589.92 ps] change: [+34.275% +34.886% +35.631%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 5 (5.00%) high mild 6 (6.00%) high severe vec4_add_v_f32 time: [603.45 ps 604.44 ps 605.59 ps] change: [−18.949% −18.215% −17.623%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe vec2_add_v_f64 time: [602.08 ps 602.83 ps 603.64 ps] change: [+89.139% +90.573% +91.808%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe vec3_add_v_f64 time: [910.94 ps 912.60 ps 914.56 ps] change: [+107.10% +108.18% +109.41%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low severe 6 (6.00%) high mild 3 (3.00%) high severe vec4_add_v_f64 time: [1.1894 ns 1.1933 ns 1.1963 ns] change: [+82.607% +85.023% +86.911%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 9 (9.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe vec2_sub_v time: [303.45 ps 304.42 ps 305.37 ps] change: [−5.3598% −4.4578% −3.6738%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 8 (8.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe vec3_sub_v time: [672.95 ps 674.82 ps 676.51 ps] change: [+51.463% +52.336% +53.346%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe vec4_sub_v time: [602.84 ps 604.65 ps 607.70 ps] change: [−19.744% −18.754% −17.881%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe vec2_mul_s time: [666.49 ps 667.29 ps 668.31 ps] change: [+111.37% +111.81% +112.32%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 4 (4.00%) low severe 6 (6.00%) high mild 6 (6.00%) high severe vec3_mul_s time: [511.42 ps 513.44 ps 515.86 ps] change: [+15.556% +16.273% +17.049%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe vec4_mul_s time: [774.13 ps 775.22 ps 776.52 ps] change: [+5.1602% +5.5545% +6.0225%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 7 (7.00%) high severe vec2_div_s time: [1.3658 ns 1.3694 ns 1.3726 ns] change: [+328.67% +329.83% +331.09%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe vec3_div_s time: [607.73 ps 608.63 ps 609.66 ps] change: [+37.642% +38.017% +38.440%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 8 (8.00%) high mild 6 (6.00%) high severe vec4_div_s time: [802.59 ps 803.62 ps 804.82 ps] change: [+8.9451% +9.3240% +9.7149%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low severe 6 (6.00%) high mild 2 (2.00%) high severe vec2_dot_f32 time: [461.20 ps 461.73 ps 462.30 ps] change: [+117.88% +119.27% +120.79%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 9 (9.00%) high severe vec3_dot_f32 time: [688.24 ps 689.05 ps 689.95 ps] change: [+225.49% +227.19% +229.16%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe vec4_dot_f32 time: [917.20 ps 921.23 ps 928.57 ps] change: [+338.59% +341.30% +344.17%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 8 (8.00%) high mild 5 (5.00%) high severe vec2_dot_f64 time: [596.11 ps 597.51 ps 598.79 ps] change: [+177.79% +179.60% +182.13%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe vec3_dot_f64 time: [749.32 ps 751.02 ps 752.81 ps] change: [+253.48% +257.12% +262.11%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe vec4_dot_f64 time: [1.0145 ns 1.0185 ns 1.0230 ns] change: [+376.34% +379.47% +383.46%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe vec3_cross time: [971.01 ps 971.87 ps 972.73 ps] change: [+122.34% +122.74% +123.17%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe vec2_norm time: [1.0612 ns 1.0623 ns 1.0637 ns] change: [−0.0722% +0.0499% +0.1765%] (p = 0.44 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) low mild 2 (2.00%) high severe vec3_norm time: [1.0649 ns 1.0665 ns 1.0694 ns] change: [−4.3787% −4.1856% −3.8679%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe vec4_norm time: [1.0733 ns 1.0739 ns 1.0746 ns] change: [−4.5616% −3.9738% −2.9157%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 2 (2.00%) low severe 7 (7.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe vec2_normalize time: [2.5310 ns 2.5326 ns 2.5345 ns] change: [+3.5769% +3.6696% +3.7678%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe vec3_normalize time: [2.5389 ns 2.5409 ns 2.5424 ns] change: [+1.1411% +1.2860% +1.4910%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe vec4_normalize time: [1.8154 ns 1.8164 ns 1.8173 ns] change: [−1.1191% −0.9926% −0.8485%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe vec10000_dot_f64 time: [2.0296 µs 2.0337 µs 2.0383 µs] change: [+71.107% +72.619% +74.228%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe vec10000_dot_f32 time: [1.1891 µs 1.1926 µs 1.1962 µs] change: [+6.3585% +7.1059% +7.9357%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 6 (6.00%) high severe vec10000_axpy_f64 time: [2.0702 µs 2.0739 µs 2.0777 µs] change: [+39.373% +40.227% +41.210%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe vec10000_axpy_beta_f64 time: [2.0914 µs 2.0962 µs 2.1012 µs] change: [+31.958% +32.843% +33.467%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 5 (5.00%) high mild 2 (2.00%) high severe vec10000_axpy_f64_slice time: [2.0272 µs 2.0303 µs 2.0335 µs] change: [+35.880% +36.621% +37.307%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) low severe 2 (2.00%) high mild 1 (1.00%) high severe vec10000_axpy_f64_static time: [13.917 µs 13.965 µs 14.005 µs] change: [+859.61% +869.73% +879.35%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe vec10000_axpy_f32 time: [1.0402 µs 1.0421 µs 1.0437 µs] change: [+38.710% +39.603% +40.363%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe vec10000_axpy_beta_f32 time: [1.0329 µs 1.0346 µs 1.0364 µs] change: [+30.705% +31.490% +32.040%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe quaternion_add_q time: [642.58 ps 650.39 ps 662.45 ps] change: [−11.788% −10.934% −9.9463%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 6 (6.00%) high severe quaternion_sub_q time: [641.16 ps 643.22 ps 645.88 ps] change: [−12.654% −11.822% −10.943%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 5 (5.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 4 (4.00%) high severe quaternion_mul_q time: [1.4252 ns 1.4271 ns 1.4294 ns] change: [+94.545% +95.022% +95.499%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe unit_quaternion_mul_v time: [1.4859 ns 1.4874 ns 1.4890 ns] change: [+242.77% +243.56% +244.31%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild single_unit_quaternion_mul_v time: [1.0422 ns 1.0457 ns 1.0504 ns] Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 4 (4.00%) high mild 4 (4.00%) high severe quaternion_mul_s time: [771.17 ps 772.18 ps 773.37 ps] change: [+6.1278% +6.4276% +6.7583%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe quaternion_div_s time: [798.54 ps 799.82 ps 801.43 ps] change: [+9.2123% +9.7287% +10.338%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe quaternion_inv time: [1.2401 ns 1.2408 ns 1.2417 ns] change: [−43.660% −43.521% −43.317%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 5 (5.00%) high mild 6 (6.00%) high severe unit_quaternion_inv time: [596.01 ps 598.93 ps 602.66 ps] change: [−49.707% −49.184% −48.445%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 6 (6.00%) high mild 9 (9.00%) high severe quaternion_conjugate time: [604.36 ps 608.60 ps 613.48 ps] Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) high mild 9 (9.00%) high severe quaternion_normalize time: [1.8268 ns 1.8274 ns 1.8281 ns] Found 18 outliers among 100 measurements (18.00%) 4 (4.00%) low severe 4 (4.00%) low mild 7 (7.00%) high mild 3 (3.00%) high severe bidiagonalize_100x100 time: [265.91 µs 266.00 µs 266.11 µs] change: [+0.7553% +0.8363% +0.9114%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe bidiagonalize_100x500 time: [2.0053 ms 2.0060 ms 2.0065 ms] change: [+4.0325% +4.2372% +4.3938%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) low severe 2 (2.00%) high mild 5 (5.00%) high severe bidiagonalize_4x4 time: [266.92 ns 267.24 ns 267.62 ns] change: [+7.1063% +7.2057% +7.3231%] (p = 0.00 < 0.05) Performance has regressed. Found 23 outliers among 100 measurements (23.00%) 1 (1.00%) low severe 5 (5.00%) low mild 13 (13.00%) high mild 4 (4.00%) high severe Benchmarking bidiagonalize_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50. bidiagonalize_500x100 time: [1.6781 ms 1.6793 ms 1.6804 ms] change: [+1.3944% +1.5312% +1.6400%] (p = 0.00 < 0.05) Performance has regressed. bidiagonalize_unpack_100x100 time: [522.13 µs 522.36 µs 522.63 µs] change: [−0.5318% −0.4044% −0.2627%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe bidiagonalize_unpack_100x500 time: [2.9858 ms 2.9916 ms 2.9976 ms] change: [−0.7824% −0.3995% −0.0370%] (p = 0.04 < 0.05) Change within noise threshold. bidiagonalize_unpack_500x100 time: [2.5884 ms 2.5896 ms 2.5910 ms] change: [+0.0767% +0.1539% +0.2316%] (p = 0.00 < 0.05) Change within noise threshold. cholesky_100x100 time: [31.084 µs 31.101 µs 31.122 µs] change: [−5.0365% −4.7949% −4.4205%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 4 (4.00%) low mild 1 (1.00%) high mild 9 (9.00%) high severe cholesky_500x500 time: [4.4799 ms 4.4849 ms 4.4903 ms] change: [−0.5985% −0.3685% −0.1374%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe cholesky_decompose_unpack_100x100 time: [31.659 µs 31.685 µs 31.727 µs] change: [−4.9712% −4.7445% −4.3325%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe cholesky_decompose_unpack_500x500 time: [4.4795 ms 4.4845 ms 4.4910 ms] change: [−1.9595% −1.7121% −1.4978%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 7 (7.00%) high severe cholesky_solve_10x10 time: [170.70 ns 170.76 ns 170.82 ns] change: [+8.0936% +8.1777% +8.2764%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe cholesky_solve_100x100 time: [2.9071 µs 2.9117 µs 2.9174 µs] change: [+8.4770% +8.9956% +9.6254%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe cholesky_solve_500x500 time: [54.193 µs 54.303 µs 54.417 µs] change: [+3.9332% +4.1755% +4.4477%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild cholesky_inverse_10x10 time: [1.3189 µs 1.3195 µs 1.3201 µs] change: [+2.5360% +2.6238% +2.7131%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe cholesky_inverse_100x100 time: [270.85 µs 270.88 µs 270.92 µs] change: [−0.9726% −0.8524% −0.7319%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe cholesky_inverse_500x500 time: [26.673 ms 26.694 ms 26.714 ms] change: [+1.0784% +1.1816% +1.2794%] (p = 0.00 < 0.05) Performance has regressed. Found 23 outliers among 100 measurements (23.00%) 19 (19.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe full_piv_lu_decompose_10x10 time: [582.31 ns 582.48 ns 582.67 ns] change: [+19.583% +19.702% +19.795%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 6 (6.00%) high mild 2 (2.00%) high severe full_piv_lu_decompose_100x100 time: [218.73 µs 218.78 µs 218.84 µs] change: [+5.8729% +5.9828% +6.0904%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low severe 5 (5.00%) low mild 1 (1.00%) high severe full_piv_lu_solve_10x10 time: [124.88 ns 124.94 ns 125.02 ns] change: [+7.4724% +7.6252% +7.7787%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low severe 6 (6.00%) high mild 4 (4.00%) high severe full_piv_lu_solve_100x100 time: [2.5202 µs 2.5244 µs 2.5289 µs] change: [+11.226% +11.847% +12.518%] (p = 0.00 < 0.05) Performance has regressed. Found 17 outliers among 100 measurements (17.00%) 14 (14.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild full_piv_lu_inverse_10x10 time: [869.61 ns 870.27 ns 871.19 ns] change: [+4.7996% +4.9224% +5.0608%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low severe 1 (1.00%) high mild 4 (4.00%) high severe full_piv_lu_inverse_100x100 time: [212.68 µs 212.83 µs 213.05 µs] change: [−0.2835% −0.0351% +0.1310%] (p = 0.80 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe full_piv_lu_determinant_10x10 time: [15.320 ns 15.338 ns 15.357 ns] change: [+410.70% +421.41% +430.41%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 9 (9.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild full_piv_lu_determinant_100x100 time: [137.44 ns 139.37 ns 141.00 ns] change: [+213.54% +227.75% +241.42%] (p = 0.00 < 0.05) Performance has regressed. hessenberg_decompose_4x4 time: [82.510 ns 82.538 ns 82.564 ns] change: [−27.950% −27.887% −27.830%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild hessenberg_decompose_100x100 time: [295.98 µs 296.16 µs 296.44 µs] change: [+3.3234% +3.5705% +3.7986%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe hessenberg_decompose_200x200 time: [2.2647 ms 2.2681 ms 2.2714 ms] change: [+4.8426% +4.9983% +5.1646%] (p = 0.00 < 0.05) Performance has regressed. hessenberg_decompose_unpack_100x100 time: [435.30 µs 435.75 µs 436.12 µs] change: [+2.7479% +2.8420% +2.9424%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe hessenberg_decompose_unpack_200x200 time: [3.2667 ms 3.2678 ms 3.2690 ms] change: [+3.9624% +4.0021% +4.0423%] (p = 0.00 < 0.05) Performance has regressed. Found 22 outliers among 100 measurements (22.00%) 13 (13.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe lu_decompose_10x10 time: [353.04 ns 353.16 ns 353.31 ns] change: [−5.0408% −4.9435% −4.8487%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 4 (4.00%) low severe 4 (4.00%) low mild 6 (6.00%) high mild 5 (5.00%) high severe lu_decompose_100x100 time: [71.544 µs 71.560 µs 71.579 µs] change: [−1.7176% −1.6430% −1.5721%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe lu_solve_10x10 time: [115.42 ns 115.52 ns 115.61 ns] change: [+3.9363% +4.1024% +4.2557%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 8 (8.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe lu_solve_100x100 time: [2.5152 µs 2.5190 µs 2.5225 µs] change: [+15.120% +15.625% +16.088%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild lu_inverse_10x10 time: [902.55 ns 903.32 ns 903.97 ns] change: [+0.7407% +0.8734% +1.0263%] (p = 0.00 < 0.05) Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high severe lu_inverse_100x100 time: [216.21 µs 216.47 µs 216.80 µs] change: [−0.6663% −0.5584% −0.4316%] (p = 0.00 < 0.05) Change within noise threshold. Found 18 outliers among 100 measurements (18.00%) 2 (2.00%) low severe 4 (4.00%) low mild 5 (5.00%) high mild 7 (7.00%) high severe lu_determinant_10x10 time: [13.394 ns 13.481 ns 13.665 ns] change: [+508.98% +524.96% +543.53%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe lu_determinant_100x100 time: [149.12 ns 150.16 ns 151.08 ns] change: [+265.69% +281.86% +296.23%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 10 (10.00%) low severe 4 (4.00%) low mild qr_decompose_100x100 time: [141.62 µs 141.65 µs 141.69 µs] change: [+0.6391% +0.8447% +0.9784%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe Benchmarking qr_decompose_100x500: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60. qr_decompose_100x500 time: [1.0071 ms 1.0082 ms 1.0097 ms] change: [+0.9031% +1.2358% +1.6126%] (p = 0.00 < 0.05) Change within noise threshold. Found 16 outliers among 100 measurements (16.00%) 12 (12.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe qr_decompose_4x4 time: [100.40 ns 100.43 ns 100.45 ns] change: [−19.315% −19.268% −19.224%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 1 (1.00%) high mild 4 (4.00%) high severe Benchmarking qr_decompose_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. qr_decompose_500x100 time: [847.17 µs 847.68 µs 848.21 µs] change: [+2.1441% +2.3425% +2.5069%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe qr_decompose_unpack_100x100 time: [283.22 µs 283.26 µs 283.30 µs] change: [−0.3591% −0.2383% −0.1147%] (p = 0.00 < 0.05) Change within noise threshold. Found 23 outliers among 100 measurements (23.00%) 21 (21.00%) low severe 1 (1.00%) low mild 1 (1.00%) high severe Benchmarking qr_decompose_unpack_100x500: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60. qr_decompose_unpack_100x500 time: [1.1399 ms 1.1429 ms 1.1457 ms] change: [−1.9555% −1.8085% −1.6312%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking qr_decompose_unpack_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.6s, enable flat sampling, or reduce sample count to 50. qr_decompose_unpack_500x100 time: [1.6633 ms 1.6640 ms 1.6648 ms] change: [+1.4516% +1.5245% +1.5969%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low severe 5 (5.00%) low mild 4 (4.00%) high severe qr_solve_10x10 time: [156.51 ns 156.56 ns 156.61 ns] change: [+3.7415% +3.8709% +3.9947%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) low severe 5 (5.00%) low mild 1 (1.00%) high mild qr_solve_100x100 time: [3.5393 µs 3.5454 µs 3.5511 µs] change: [+6.0908% +6.5747% +6.9798%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) low mild qr_inverse_10x10 time: [806.75 ns 807.99 ns 809.61 ns] change: [+0.6973% +0.8242% +0.9558%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe qr_inverse_100x100 time: [330.65 µs 330.74 µs 330.85 µs] change: [+1.2238% +1.3244% +1.4518%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe schur_decompose_4x4 time: [969.14 ns 969.71 ns 970.18 ns] change: [−12.293% −12.223% −12.149%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe schur_decompose_10x10 time: [7.3226 µs 7.3237 µs 7.3247 µs] change: [+0.3785% +0.4095% +0.4394%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low mild 4 (4.00%) high mild 3 (3.00%) high severe schur_decompose_100x100 time: [2.5760 ms 2.5763 ms 2.5768 ms] change: [+0.7992% +0.8504% +0.8935%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe schur_decompose_200x200 time: [18.285 ms 18.296 ms 18.308 ms] change: [+1.9360% +2.0941% +2.2427%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe eigenvalues_4x4 time: [937.94 ns 938.15 ns 938.38 ns] change: [+25.764% +25.898% +26.023%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild eigenvalues_10x10 time: [5.9066 µs 5.9088 µs 5.9117 µs] change: [+0.1208% +0.1938% +0.2740%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe Benchmarking eigenvalues_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. eigenvalues_100x100 time: [1.5870 ms 1.5873 ms 1.5876 ms] change: [−0.8569% −0.8247% −0.7914%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe eigenvalues_200x200 time: [11.081 ms 11.088 ms 11.102 ms] change: [+0.0054% +0.2956% +0.4946%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe solve_l_triangular_100x100 time: [1.3250 µs 1.3651 µs 1.4012 µs] change: [+22.932% +24.999% +27.087%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 10 (10.00%) high mild 2 (2.00%) high severe solve_l_triangular_1000x1000 time: [101.52 µs 102.04 µs 102.85 µs] change: [+1.5784% +2.0953% +2.8471%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 9 (9.00%) high mild 6 (6.00%) high severe tr_solve_l_triangular_100x100 time: [2.0144 µs 2.0537 µs 2.0902 µs] change: [+13.600% +14.669% +15.998%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe tr_solve_l_triangular_1000x1000 time: [93.569 µs 94.056 µs 94.857 µs] change: [+1.2474% +1.7955% +2.5979%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe solve_u_triangular_100x100 time: [1.5878 µs 1.6615 µs 1.7405 µs] change: [+31.200% +34.370% +38.132%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 10 (10.00%) high mild 3 (3.00%) high severe solve_u_triangular_1000x1000 time: [105.07 µs 105.46 µs 106.12 µs] change: [+6.6559% +7.0936% +7.8401%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe tr_solve_u_triangular_100x100 time: [1.4369 µs 1.4697 µs 1.4986 µs] change: [+17.195% +18.687% +20.307%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 11 (11.00%) high mild 2 (2.00%) high severe tr_solve_u_triangular_1000x1000 time: [88.868 µs 89.303 µs 90.014 µs] change: [+4.2489% +4.7933% +5.6045%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe svd_decompose_2x2 time: [22.913 ns 22.958 ns 23.017 ns] change: [+9.3648% +9.7443% +10.253%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe svd_decompose_3x3 time: [359.30 ns 359.72 ns 360.20 ns] change: [+9.0123% +9.1174% +9.2394%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild svd_decompose_4x4 time: [896.28 ns 896.55 ns 896.85 ns] change: [−7.1192% −7.0496% −6.9853%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe svd_decompose_10x10 time: [5.7680 µs 5.7708 µs 5.7739 µs] change: [+1.1933% +1.4155% +1.6347%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe Benchmarking svd_decompose_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. svd_decompose_100x100 time: [1.5704 ms 1.5709 ms 1.5715 ms] change: [+1.4465% +1.4891% +1.5357%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe svd_decompose_200x200 time: [11.845 ms 11.847 ms 11.850 ms] change: [+1.4378% +1.4794% +1.5225%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high severe rank_4x4 time: [716.49 ns 716.62 ns 716.74 ns] change: [+4.9084% +4.9678% +5.0237%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild rank_10x10 time: [4.2304 µs 4.2341 µs 4.2377 µs] change: [+0.4993% +0.6056% +0.7271%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild rank_100x100 time: [522.74 µs 522.85 µs 522.97 µs] change: [+0.2822% +0.3170% +0.3535%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high severe rank_200x200 time: [3.0167 ms 3.0217 ms 3.0267 ms] change: [+0.3924% +0.5333% +0.6946%] (p = 0.00 < 0.05) Change within noise threshold. singular_values_4x4 time: [735.97 ns 736.08 ns 736.21 ns] change: [−7.6736% −7.6163% −7.5596%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe singular_values_10x10 time: [4.2987 µs 4.2997 µs 4.3010 µs] change: [+1.6193% +1.7215% +1.8186%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe singular_values_100x100 time: [525.20 µs 525.36 µs 525.54 µs] change: [+0.4054% +0.4526% +0.4982%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe singular_values_200x200 time: [3.0712 ms 3.0729 ms 3.0750 ms] change: [+2.1769% +2.2358% +2.3112%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe pseudo_inverse_4x4 time: [877.64 ns 878.38 ns 879.12 ns] change: [−8.2828% −8.2216% −8.1662%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 7 (7.00%) high severe pseudo_inverse_10x10 time: [6.0008 µs 6.0034 µs 6.0064 µs] change: [+0.2665% +0.3678% +0.4766%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe Benchmarking pseudo_inverse_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50. pseudo_inverse_100x100 time: [1.6088 ms 1.6091 ms 1.6094 ms] change: [+0.1161% +0.2007% +0.2937%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe pseudo_inverse_200x200 time: [12.038 ms 12.042 ms 12.047 ms] change: [−0.4351% −0.2531% −0.0699%] (p = 0.01 < 0.05) Change within noise threshold. Found 22 outliers among 100 measurements (22.00%) 16 (16.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe symmetric_eigen_decompose_4x4 time: [518.00 ns 518.07 ns 518.15 ns] change: [+4.7008% +4.7492% +4.8006%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe symmetric_eigen_decompose_10x10 time: [3.6417 µs 3.6428 µs 3.6440 µs] change: [−0.1549% −0.0998% −0.0483%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe symmetric_eigen_decompose_100x100 time: [761.64 µs 762.66 µs 763.80 µs] change: [−5.8109% −5.7178% −5.6284%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 9 (9.00%) low severe 9 (9.00%) low mild 1 (1.00%) high severe symmetric_eigen_decompose_200x200 time: [5.1304 ms 5.1337 ms 5.1372 ms] change: [−9.4434% −9.3646% −9.2959%] (p = 0.00 < 0.05) Performance has improved. Total run time of full benchmark suite on my machine (AMD 5950X) has not changed and is still around ~30 minutes.
Some algorithms may not converge when used on completely random values with the default value of epsilon and unlimited iterations. `reproducible_dmatrix()` already exist to circumvent this for `DMatrix`, so I implemented the same for `SMatrix`. In my tests this problem manifested itself only on `schur_decompose_4x4`, but I decided to apply similar fix for all benchmarks that also use `reproducible_dmatrix()` for `DMatrix`.
Random matrices may be not positive-definite and Cholesky decomposition benchmarks panic because of that: Benchmarking cholesky_decompose_unpack_100x100: Warming up for 3.0000 s thread 'main' panicked at benches/linalg/cholesky.rs:38:45: called `Option::unwrap()` on a `None` value
9d1c4ef
to
cc7f108
Compare
Hey @im-0, sorry GitHub mobile is not letting me provide an actual review. Thanks for the effort, that's very exhaustive benchmarking now. The one thing I stumbled over is that sometimes matrices with constant values are generated ('from_slice', 'from_element') rather than random values. This seems to be a bit inconsistent and I think I'd prefer consistent random value generation. Other than that this looks great to me. I've also asked the faer maintainer Sarah-ek to have a look at this. They might have some valuable input as well. But to me everything looks good, except the inconsistency with the constant values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall fantastic, I just had some questions on the use of reproducible matrix and some leftover constant vectors. Plus one remark on the cholesky test.
benches/linalg/cholesky.rs
Outdated
bh.bench_function("cholesky_100x100", |bh| { | ||
bh.iter_batched( | ||
|| { | ||
let m = crate::reproducible_dmatrix(100, 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a suspicion why this calls the reproducible matrix and I think there's a problem. Let me explain. For a Cholesky decomposition of a matrix A
to be defined, we need the matrix to be symmetric positive definite. That's actually why the line let m = &m * m.transpose()
exists in the old test, but it's still wrong. To create a symmetric positive semidefinite matrix, it's okay to calculate A A^T
, but this might still be singular. A numerically stable way to create an actually positive definite matrix from that is to calculate A A^T + alpha * Id
with Id
the identity matrix and alpha chosen for numerical stability. An alpha
that works is e.g. f64::EPSILON * A.norm_squared()
. I know this because I had to fix that exact problem in the nalgebra-lapack
proptests recently, see https://github.com/dimforge/nalgebra/blob/main/nalgebra-lapack/tests/linalg/cholesky.rs, specifically the positive_definite_dmatrix
function.
benches/linalg/cholesky.rs
Outdated
bh.bench_function("cholesky_500x500", |bh| { | ||
bh.iter_batched( | ||
|| { | ||
let m = crate::reproducible_dmatrix(500, 500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the 100x100 test
benches/linalg/cholesky.rs
Outdated
bh.bench_function("cholesky_decompose_unpack_100x100", |bh| { | ||
bh.iter_batched( | ||
|| { | ||
let m = crate::reproducible_dmatrix(100, 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the 100x100 test
benches/linalg/cholesky.rs
Outdated
bh.bench_function("cholesky_decompose_unpack_500x500", |bh| { | ||
bh.iter_batched( | ||
|| { | ||
let m = crate::reproducible_dmatrix(500, 500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the 100x100 test
benches/linalg/cholesky.rs
Outdated
bh.bench_function("cholesky_solve_10x10", |bh| { | ||
bh.iter_batched_ref( | ||
|| { | ||
let m = crate::reproducible_dmatrix(10, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the 100x100 test
benches/linalg/qr.rs
Outdated
bh.iter_batched( | ||
|| { | ||
let m = DMatrix::<f64>::new_random(10, 10); | ||
(QR::new(m), DVector::<f64>::from_element(10, 1.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-random-vector
benches/linalg/qr.rs
Outdated
bh.iter_batched( | ||
|| { | ||
let m = DMatrix::<f64>::new_random(100, 100); | ||
(QR::new(m), DVector::<f64>::from_element(100, 1.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-random-vector
benches/linalg/schur.rs
Outdated
bh.iter(|| std::hint::black_box(Schur::new(m.clone()))) | ||
bh.bench_function("schur_decompose_4x4", |bh| { | ||
bh.iter_batched( | ||
|| crate::reproducible_smatrix::<f64, 4, 4>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the reproducible matrix called here? I'm not so familiar with the Schur decomposition, but from a cursory glance at wikipedia, any square real matrix should have one. Same question for the other instances of the test below.
benches/linalg/schur.rs
Outdated
bh.iter(|| std::hint::black_box(m.complex_eigenvalues())) | ||
bh.bench_function("eigenvalues_4x4", |bh| { | ||
bh.iter_batched_ref( | ||
|| crate::reproducible_smatrix::<f64, 4, 4>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question as above, why cal the reproducible matrix here instead of a random one?
benches/linalg/svd.rs
Outdated
bh.iter(|| std::hint::black_box(SVD::new_unordered(m.clone(), true, true))) | ||
bh.bench_function("svd_decompose_2x2", |bh| { | ||
bh.iter_batched( | ||
|| crate::reproducible_smatrix::<f32, 2, 2>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use the reproducible matrix here? Same for the instances below
…ly generated positive definite matrix
…ls and replace with random
hey @im-0, I've implemented the changes myself, because I felt I was bothering you unduly. Please let me know if you agree with those and then I think we can get this merged. |
Was busy with other things. I will check this later today or tomorrow. I think that at least for some algorithms it will be better to use a predictable sequence of random matrices instead of completely random values on each benchmark run. But I am not completely sure about this and need to check the actual implementation... |
@im-0 please feel free to implement changes as you see fit. I think this will be the last iteration. The one thing I'm wondering is whether the 'reproduciple_matrix' actually produces a random sequence of matrices or whether it seeds the rng on each call. I'm on mobile right now, so I don't have the code at hand. |
@im-0 UPDATE: I've looked at the code and each call to That means the sequence of random numbers will always be the same for each call. So two matrices of the same size created with |
@im-0, not trying to rush you. I just know you had some thoughts about whether you are happy with this PR to get merged. Let me know if you are fine to proceed. |
For details see: #1547
Before/after comparison on AMD Ryzen 9 5950X:
click for details...
Significant regression means that the computation of a resulting value was optimized out of the benchmarking loop previously.