Make benchmarks measure an actual computation #1549

im-0 · 2025-09-24T13:45:00Z

For details see: #1547

Before/after comparison on AMD Ryzen 9 5950X:

click for details...

mat2_mul_m              time:   [1.8211 ns 1.8224 ns 1.8239 ns]
                        change: [+146.50% +146.66% +146.86%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  9 (9.00%) high severe

mat3_mul_m              time:   [10.085 ns 10.090 ns 10.096 ns]
                        change: [+531.16% +531.50% +531.97%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

mat4_mul_m              time:   [11.219 ns 11.234 ns 11.250 ns]
                        change: [+277.89% +278.45% +278.98%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  9 (9.00%) low mild
  7 (7.00%) high mild
  2 (2.00%) high severe

mat2_tr_mul_m           time:   [1.7146 ns 1.7154 ns 1.7161 ns]
                        change: [+131.55% +131.93% +132.26%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

mat3_tr_mul_m           time:   [9.7604 ns 9.7655 ns 9.7713 ns]
                        change: [+513.89% +514.65% +515.31%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 21 outliers among 100 measurements (21.00%)
  4 (4.00%) high mild
  17 (17.00%) high severe

mat4_tr_mul_m           time:   [9.0668 ns 9.0724 ns 9.0786 ns]
                        change: [+206.70% +206.94% +207.18%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  3 (3.00%) high severe

mat2_add_m              time:   [1.7747 ns 1.7768 ns 1.7794 ns]
                        change: [+139.74% +140.01% +140.28%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe

mat3_add_m              time:   [4.1347 ns 4.1397 ns 4.1450 ns]
                        change: [+158.49% +158.79% +159.11%] (p = 0.00 < 0.05)
                        Performance has regressed.

mat4_add_m              time:   [6.3138 ns 6.3202 ns 6.3277 ns]
                        change: [+113.14% +113.45% +113.79%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

mat2_sub_m              time:   [1.7622 ns 1.7636 ns 1.7653 ns]
                        change: [+138.09% +138.34% +138.58%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

mat3_sub_m              time:   [4.1355 ns 4.1407 ns 4.1472 ns]
                        change: [+159.26% +159.58% +159.93%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

mat4_sub_m              time:   [6.3649 ns 6.3712 ns 6.3777 ns]
                        change: [+113.88% +114.05% +114.23%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

mat2_mul_v              time:   [1.8745 ns 1.8809 ns 1.8880 ns]
                        change: [+490.83% +492.86% +494.55%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

mat3_mul_v              time:   [8.7801 ns 8.7907 ns 8.8027 ns]
                        change: [+1890.3% +1894.0% +1898.5%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) high mild
  14 (14.00%) high severe

mat4_mul_v              time:   [2.7012 ns 2.7086 ns 2.7170 ns]
                        change: [+264.61% +265.53% +266.48%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

mat2_tr_mul_v           time:   [1.3577 ns 1.3579 ns 1.3582 ns]
                        change: [+325.94% +326.43% +326.83%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

mat3_tr_mul_v           time:   [2.3408 ns 2.3449 ns 2.3491 ns]
                        change: [+419.84% +420.66% +421.31%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

mat4_tr_mul_v           time:   [3.1961 ns 3.2026 ns 3.2100 ns]
                        change: [+329.36% +330.71% +332.28%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

mat2_mul_s              time:   [1.5770 ns 1.5804 ns 1.5846 ns]
                        change: [+112.05% +112.90% +113.84%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) high mild
  15 (15.00%) high severe

mat3_mul_s              time:   [3.2606 ns 3.2749 ns 3.2909 ns]
                        change: [+105.41% +106.09% +106.85%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  6 (6.00%) high mild
  11 (11.00%) high severe

mat4_mul_s              time:   [5.3422 ns 5.3465 ns 5.3512 ns]
                        change: [+80.678% +80.813% +80.952%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

mat2_div_s              time:   [1.6070 ns 1.6156 ns 1.6256 ns]
                        change: [+117.73% +118.67% +119.89%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 21 outliers among 100 measurements (21.00%)
  1 (1.00%) high mild
  20 (20.00%) high severe

mat3_div_s              time:   [3.3834 ns 3.3934 ns 3.4053 ns]
                        change: [+112.87% +113.44% +114.19%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) high mild
  14 (14.00%) high severe

mat4_div_s              time:   [5.6942 ns 5.6986 ns 5.7034 ns]
                        change: [+91.417% +91.588% +91.762%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

mat2_inv                time:   [1.8721 ns 1.8725 ns 1.8731 ns]
                        change: [+8.9164% +9.1056% +9.2598%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

mat3_inv                time:   [5.3171 ns 5.3242 ns 5.3315 ns]
                        change: [+2.0336% +2.1086% +2.1810%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high severe

mat4_inv                time:   [27.564 ns 27.591 ns 27.632 ns]
                        change: [-5.5949% -5.4858% -5.3888%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe

mat2_transpose          time:   [1.2243 ns 1.2248 ns 1.2258 ns]
                        change: [+9.6634% +9.7213% +9.7847%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

mat3_transpose          time:   [2.6247 ns 2.6261 ns 2.6276 ns]
                        change: [+3.2032% +3.2351% +3.2676%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  8 (8.00%) high severe

mat4_transpose          time:   [5.0910 ns 5.0925 ns 5.0938 ns]
                        change: [+3.4672% +3.5265% +3.5802%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

mat_div_scalar          time:   [621.16 µs 621.39 µs 621.68 µs]
                        change: [-0.9877% -0.8953% -0.8031%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

mat100_add_mat100       time:   [1.7506 µs 1.7515 µs 1.7526 µs]
                        change: [+0.7028% +0.7809% +0.8593%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

mat4_mul_mat4           time:   [33.744 ns 33.794 ns 33.898 ns]
                        change: [+17.549% +17.748% +18.055%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

mat5_mul_mat5           time:   [48.309 ns 48.319 ns 48.331 ns]
                        change: [-44.665% -44.512% -44.362%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

mat6_mul_mat6           time:   [77.366 ns 77.384 ns 77.405 ns]
                        change: [+0.4793% +0.5202% +0.5684%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

mat7_mul_mat7           time:   [83.373 ns 83.398 ns 83.430 ns]
                        change: [-0.3133% -0.1903% -0.0321%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

mat8_mul_mat8           time:   [69.337 ns 69.426 ns 69.532 ns]
                        change: [-0.6629% -0.2159% +0.0430%] (p = 0.34 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe

mat9_mul_mat9           time:   [187.68 ns 187.79 ns 187.89 ns]
                        change: [+0.3227% +0.3859% +0.4503%] (p = 0.00 < 0.05)
                        Change within noise threshold.

mat10_mul_mat10         time:   [199.87 ns 200.02 ns 200.18 ns]
                        change: [+2.5979% +2.6928% +2.7811%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild

mat10_mul_mat10_static  time:   [123.43 ns 123.48 ns 123.56 ns]
                        change: [+17.238% +17.404% +17.662%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

mat100_mul_mat100       time:   [39.498 µs 39.600 µs 39.717 µs]
                        change: [+0.5117% +0.6856% +0.8528%] (p = 0.00 < 0.05)
                        Change within noise threshold.

mat500_mul_mat500       time:   [4.3615 ms 4.3638 ms 4.3667 ms]
                        change: [-1.5042% -1.4307% -1.3514%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  13 (13.00%) high severe

iter                    time:   [851.64 µs 851.73 µs 851.84 µs]
                        change: [+10.980% +11.254% +11.506%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

iter_rev                time:   [212.95 µs 212.98 µs 213.02 µs]
                        change: [+0.2664% +0.5462% +0.7106%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high severe

copy_from               time:   [141.14 µs 141.31 µs 141.53 µs]
                        change: [-0.8809% -0.5821% -0.3187%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

axpy                    time:   [17.648 µs 17.693 µs 17.741 µs]
                        change: [-1.2897% -1.0169% -0.7049%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

tr_mul_to               time:   [136.20 µs 136.28 µs 136.36 µs]
                        change: [+0.5699% +1.3218% +1.7817%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 18 outliers among 100 measurements (18.00%)
  10 (10.00%) high mild
  8 (8.00%) high severe

mat_mul_mat             time:   [39.931 µs 39.970 µs 40.007 µs]
                        change: [+2.4660% +2.5525% +2.6388%] (p = 0.00 < 0.05)
                        Performance has regressed.

mat100_from_fn          time:   [6.8839 µs 6.8872 µs 6.8901 µs]
                        change: [+514.04% +514.93% +515.78%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

mat500_from_fn          time:   [173.41 µs 173.45 µs 173.49 µs]
                        change: [+500.13% +501.19% +502.54%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) low severe
  5 (5.00%) low mild
  3 (3.00%) high severe

vec2_add_v_f32          time:   [1.1836 ns 1.1840 ns 1.1843 ns]
                        change: [+273.42% +273.97% +274.44%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

vec3_add_v_f32          time:   [1.7895 ns 1.7906 ns 1.7918 ns]
                        change: [+304.64% +305.11% +305.60%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

vec4_add_v_f32          time:   [1.7916 ns 1.7944 ns 1.7979 ns]
                        change: [+143.09% +143.48% +143.88%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

vec2_add_v_f64          time:   [1.1722 ns 1.1734 ns 1.1748 ns]
                        change: [+269.25% +270.01% +270.79%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

vec3_add_v_f64          time:   [1.9394 ns 1.9423 ns 1.9449 ns]
                        change: [+331.48% +332.20% +332.90%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

vec4_add_v_f64          time:   [2.2729 ns 2.2761 ns 2.2795 ns]
                        change: [+252.84% +253.55% +254.30%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe

vec2_sub_v              time:   [1.2017 ns 1.2029 ns 1.2044 ns]
                        change: [+274.58% +275.29% +276.09%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

vec3_sub_v              time:   [1.7818 ns 1.7838 ns 1.7861 ns]
                        change: [+303.94% +304.55% +305.21%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild

vec4_sub_v              time:   [1.7921 ns 1.7936 ns 1.7951 ns]
                        change: [+140.58% +141.03% +141.57%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

vec2_mul_s              time:   [985.48 ps 985.59 ps 985.78 ps]
                        change: [+210.12% +210.25% +210.38%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

vec3_mul_s              time:   [1.4056 ns 1.4059 ns 1.4062 ns]
                        change: [+217.88% +218.05% +218.20%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe

vec4_mul_s              time:   [1.5828 ns 1.5839 ns 1.5850 ns]
                        change: [+114.04% +114.28% +114.53%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

vec2_div_s              time:   [1.4854 ns 1.4860 ns 1.4867 ns]
                        change: [+366.97% +367.15% +367.33%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

vec3_div_s              time:   [1.4821 ns 1.4832 ns 1.4848 ns]
                        change: [+229.77% +230.14% +230.45%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

vec4_div_s              time:   [1.6161 ns 1.6175 ns 1.6189 ns]
                        change: [+116.74% +117.02% +117.30%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

vec2_dot_f32            time:   [715.01 ps 717.55 ps 721.65 ps]
                        change: [+231.29% +232.12% +233.08%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

vec3_dot_f32            time:   [7.5379 ns 7.5393 ns 7.5412 ns]
                        change: [+3441.1% +3445.4% +3449.7%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  4 (4.00%) high mild
  12 (12.00%) high severe

vec4_dot_f32            time:   [1.1914 ns 1.1941 ns 1.1968 ns]
                        change: [+455.67% +456.70% +457.85%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

vec2_dot_f64            time:   [833.73 ps 834.75 ps 835.77 ps]
                        change: [+286.15% +286.81% +287.48%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

vec3_dot_f64            time:   [7.5031 ns 7.5143 ns 7.5302 ns]
                        change: [+3387.4% +3390.9% +3395.5%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

vec4_dot_f64            time:   [1.2572 ns 1.2582 ns 1.2593 ns]
                        change: [+481.66% +482.79% +483.69%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

vec3_cross              time:   [7.6360 ns 7.6371 ns 7.6383 ns]
                        change: [+1603.2% +1604.4% +1605.3%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
  8 (8.00%) high severe

vec2_norm               time:   [1.0934 ns 1.0936 ns 1.0939 ns]
                        change: [+0.4240% +0.4477% +0.4713%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

vec3_norm               time:   [1.1115 ns 1.1117 ns 1.1120 ns]
                        change: [-1.2543% -1.1951% -1.1429%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

vec4_norm               time:   [1.1119 ns 1.1121 ns 1.1124 ns]
                        change: [-2.7306% -2.5540% -2.4351%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high severe

vec2_normalize          time:   [2.4860 ns 2.4869 ns 2.4880 ns]
                        change: [+0.8836% +0.9584% +1.0264%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe

vec3_normalize          time:   [2.5838 ns 2.5843 ns 2.5850 ns]
                        change: [+2.4280% +2.5396% +2.6267%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

vec4_normalize          time:   [1.9319 ns 1.9321 ns 1.9323 ns]
                        change: [+3.0547% +3.1098% +3.1644%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

vec10000_dot_f64        time:   [2.4662 µs 2.4669 µs 2.4677 µs]
                        change: [+103.74% +103.80% +103.89%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

vec10000_dot_f32        time:   [1.7355 µs 1.7368 µs 1.7386 µs]
                        change: [+56.145% +56.539% +56.985%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

vec10000_axpy_f64       time:   [1.5285 µs 1.5289 µs 1.5293 µs]
                        change: [+1.5407% +1.5954% +1.6519%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

vec10000_axpy_beta_f64  time:   [1.6062 µs 1.6083 µs 1.6123 µs]
                        change: [-1.2658% -1.1519% -1.0143%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

vec10000_axpy_f64_slice time:   [1.4899 µs 1.4900 µs 1.4902 µs]
                        change: [+1.0994% +1.1302% +1.1582%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

vec10000_axpy_f64_static
                        time:   [1.4381 µs 1.4384 µs 1.4386 µs]
                        change: [-1.4760% -1.3183% -1.2239%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

vec10000_axpy_f32       time:   [758.37 ns 758.47 ns 758.59 ns]
                        change: [+0.7210% +1.1900% +1.4537%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

vec10000_axpy_beta_f32  time:   [859.70 ns 859.90 ns 860.17 ns]
                        change: [+7.1477% +7.2858% +7.4170%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

quaternion_add_q        time:   [1.7877 ns 1.7890 ns 1.7903 ns]
                        change: [+140.68% +140.91% +141.14%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

quaternion_sub_q        time:   [1.7894 ns 1.7907 ns 1.7920 ns]
                        change: [+140.86% +141.23% +141.61%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

quaternion_mul_q        time:   [3.2688 ns 3.2697 ns 3.2705 ns]
                        change: [+342.95% +343.27% +343.63%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

unit_quaternion_mul_v   time:   [11.541 ns 11.549 ns 11.563 ns]
                        change: [+2500.6% +2504.0% +2506.8%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 24 outliers among 100 measurements (24.00%)
  5 (5.00%) low severe
  6 (6.00%) low mild
  6 (6.00%) high mild
  7 (7.00%) high severe

quaternion_mul_s        time:   [1.5707 ns 1.5711 ns 1.5715 ns]
                        change: [+112.20% +112.40% +112.57%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

quaternion_div_s        time:   [1.5778 ns 1.5785 ns 1.5794 ns]
                        change: [+112.39% +112.52% +112.66%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low severe
  3 (3.00%) high mild
  4 (4.00%) high severe

quaternion_inv          time:   [1.9206 ns 1.9213 ns 1.9220 ns]
                        change: [+5.0362% +5.0895% +5.1421%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

unit_quaternion_inv     time:   [1.3271 ns 1.3285 ns 1.3295 ns]
                        change: [+9.1358% +9.2383% +9.3384%] (p = 0.00 < 0.05)
                        Performance has regressed.

bidiagonalize_100x100   time:   [265.82 µs 266.42 µs 267.32 µs]
                        change: [-0.4676% -0.2799% -0.0688%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

Benchmarking bidiagonalize_100x500: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.8s, enable flat sampling, or reduce sample count to 50.
bidiagonalize_100x500   time:   [1.9467 ms 1.9530 ms 1.9592 ms]
                        change: [-4.9336% -4.6776% -4.4570%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 32 outliers among 100 measurements (32.00%)
  8 (8.00%) low mild
  2 (2.00%) high mild
  22 (22.00%) high severe

bidiagonalize_4x4       time:   [248.64 ns 248.71 ns 248.81 ns]
                        change: [-5.2386% -5.1754% -5.1180%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high severe

Benchmarking bidiagonalize_500x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.3s, enable flat sampling, or reduce sample count to 50.
bidiagonalize_500x100   time:   [1.6480 ms 1.6490 ms 1.6504 ms]
                        change: [-1.6630% -1.4692% -1.2645%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

bidiagonalize_unpack_100x100
                        time:   [523.30 µs 523.38 µs 523.46 µs]
                        change: [-0.2366% -0.1909% -0.1462%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

bidiagonalize_unpack_100x500
                        time:   [3.0536 ms 3.0596 ms 3.0656 ms]
                        change: [+0.4142% +0.6144% +0.8242%] (p = 0.00 < 0.05)
                        Change within noise threshold.

bidiagonalize_unpack_500x100
                        time:   [2.6027 ms 2.6039 ms 2.6052 ms]
                        change: [-0.3104% -0.1818% -0.0919%] (p = 0.00 < 0.05)
                        Change within noise threshold.

cholesky_100x100        time:   [37.281 µs 37.289 µs 37.296 µs]
                        change: [+15.445% +15.524% +15.590%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

cholesky_500x500        time:   [4.8081 ms 4.8162 ms 4.8260 ms]
                        change: [+5.9663% +6.2001% +6.4592%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 30 outliers among 100 measurements (30.00%)
  19 (19.00%) low severe
  11 (11.00%) high severe

cholesky_decompose_unpack_100x100
                        time:   [37.755 µs 37.763 µs 37.773 µs]
                        change: [+14.066% +14.305% +14.477%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

cholesky_decompose_unpack_500x500
                        time:   [4.6743 ms 4.6891 ms 4.7052 ms]
                        change: [+2.1368% +2.4522% +2.8032%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 19 outliers among 100 measurements (19.00%)
  19 (19.00%) high severe

cholesky_solve_10x10    time:   [160.86 ns 160.97 ns 161.17 ns]
                        change: [+0.2713% +0.3412% +0.4236%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) high mild
  17 (17.00%) high severe

cholesky_solve_100x100  time:   [2.7392 µs 2.7399 µs 2.7407 µs]
                        change: [-0.2820% -0.2443% -0.2066%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

cholesky_solve_500x500  time:   [52.883 µs 52.896 µs 52.917 µs]
                        change: [+1.9926% +2.2238% +2.5966%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

cholesky_inverse_10x10  time:   [1.3102 µs 1.3110 µs 1.3119 µs]
                        change: [+1.5928% +1.6789% +1.7629%] (p = 0.00 < 0.05)
                        Performance has regressed.

cholesky_inverse_100x100
                        time:   [276.96 µs 276.98 µs 277.01 µs]
                        change: [+0.3069% +0.3685% +0.4182%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

cholesky_inverse_500x500
                        time:   [27.078 ms 27.084 ms 27.090 ms]
                        change: [+2.3068% +2.3369% +2.3710%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

full_piv_lu_decompose_10x10
                        time:   [560.68 ns 561.00 ns 561.33 ns]
                        change: [+0.6362% +0.7103% +0.7954%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high severe

full_piv_lu_decompose_100x100
                        time:   [207.09 µs 207.12 µs 207.15 µs]
                        change: [-0.3356% -0.2879% -0.2475%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

full_piv_lu_solve_10x10 time:   [117.33 ns 117.39 ns 117.46 ns]
                        change: [-1.0837% -1.0232% -0.9624%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

full_piv_lu_solve_100x100
                        time:   [2.1694 µs 2.1707 µs 2.1729 µs]
                        change: [-1.7197% -1.6315% -1.5183%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

full_piv_lu_inverse_10x10
                        time:   [857.13 ns 857.32 ns 857.52 ns]
                        change: [-0.4396% -0.3489% -0.2379%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

full_piv_lu_inverse_100x100
                        time:   [211.92 µs 212.00 µs 212.10 µs]
                        change: [-2.0475% -1.9749% -1.9135%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

full_piv_lu_determinant_10x10
                        time:   [3.4777 ns 3.4794 ns 3.4814 ns]
                        change: [+17.827% +17.938% +18.036%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

full_piv_lu_determinant_100x100
                        time:   [38.435 ns 38.454 ns 38.475 ns]
                        change: [+3.5887% +3.6755% +3.7612%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

hessenberg_decompose_4x4
                        time:   [114.52 ns 114.54 ns 114.57 ns]
                        change: [-0.6226% -0.5236% -0.4055%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

hessenberg_decompose_100x100
                        time:   [289.44 µs 289.48 µs 289.54 µs]
                        change: [-0.1901% -0.1355% -0.0850%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

hessenberg_decompose_200x200
                        time:   [2.2102 ms 2.2147 ms 2.2212 ms]
                        change: [+0.8072% +1.0155% +1.2723%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
  12 (12.00%) low severe
  4 (4.00%) high severe

hessenberg_decompose_unpack_100x100
                        time:   [428.66 µs 428.77 µs 428.91 µs]
                        change: [-0.2769% -0.2390% -0.1885%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe

hessenberg_decompose_unpack_200x200
                        time:   [3.2263 ms 3.2288 ms 3.2314 ms]
                        change: [+0.8029% +1.0059% +1.1576%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

lu_decompose_10x10      time:   [361.75 ns 362.08 ns 362.41 ns]
                        change: [+13.439% +13.613% +13.784%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low severe
  1 (1.00%) high mild

lu_decompose_100x100    time:   [73.649 µs 73.662 µs 73.687 µs]
                        change: [+0.9113% +0.9610% +1.0216%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

lu_solve_10x10          time:   [110.70 ns 110.74 ns 110.78 ns]
                        change: [-0.6342% -0.5758% -0.5265%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

lu_solve_100x100        time:   [2.1038 µs 2.1047 µs 2.1059 µs]
                        change: [-2.7288% -2.6424% -2.5009%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe

lu_inverse_10x10        time:   [887.44 ns 887.64 ns 887.88 ns]
                        change: [-2.8481% -2.7682% -2.6966%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

lu_inverse_100x100      time:   [215.47 µs 215.67 µs 215.90 µs]
                        change: [-1.0637% -0.9343% -0.7643%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 17 outliers among 100 measurements (17.00%)
  8 (8.00%) high mild
  9 (9.00%) high severe

lu_determinant_10x10    time:   [2.5924 ns 2.5970 ns 2.6015 ns]
                        change: [+20.878% +21.192% +21.451%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

lu_determinant_100x100  time:   [35.934 ns 35.983 ns 36.032 ns]
                        change: [-1.7500% -1.6698% -1.5889%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  15 (15.00%) high severe

qr_decompose_100x100    time:   [143.15 µs 143.24 µs 143.40 µs]
                        change: [+0.7070% +0.8463% +1.0082%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking qr_decompose_100x500: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.1s, enable flat sampling, or reduce sample count to 60.
qr_decompose_100x500    time:   [1.0041 ms 1.0064 ms 1.0112 ms]
                        change: [-1.0163% -0.8784% -0.6928%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

qr_decompose_4x4        time:   [125.09 ns 125.10 ns 125.12 ns]
                        change: [-0.2898% -0.0935% +0.1416%] (p = 0.49 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

qr_decompose_500x100    time:   [834.17 µs 835.03 µs 836.03 µs]
                        change: [-0.3712% -0.1635% +0.0648%] (p = 0.14 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

qr_decompose_unpack_100x100
                        time:   [283.60 µs 283.90 µs 284.16 µs]
                        change: [-0.3949% -0.2993% -0.1910%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

Benchmarking qr_decompose_unpack_100x500: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
qr_decompose_unpack_100x500
                        time:   [1.1475 ms 1.1491 ms 1.1521 ms]
                        change: [-0.8690% -0.7318% -0.5277%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Benchmarking qr_decompose_unpack_500x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50.
qr_decompose_unpack_500x100
                        time:   [1.6793 ms 1.6797 ms 1.6801 ms]
                        change: [+2.4782% +2.5491% +2.6214%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

qr_solve_10x10          time:   [152.57 ns 152.63 ns 152.73 ns]
                        change: [-0.4288% -0.3814% -0.3276%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

qr_solve_100x100        time:   [3.3232 µs 3.3254 µs 3.3285 µs]
                        change: [-0.0158% +0.2076% +0.5519%] (p = 0.13 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  5 (5.00%) high severe

qr_inverse_10x10        time:   [805.76 ns 806.05 ns 806.44 ns]
                        change: [-0.9942% -0.7502% -0.6015%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

qr_inverse_100x100      time:   [330.09 µs 330.39 µs 330.67 µs]
                        change: [+0.4902% +0.5806% +0.6779%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

schur_decompose_4x4     time:   [926.95 ns 928.24 ns 929.26 ns]
                        change: [-13.166% -13.030% -12.884%] (p = 0.00 < 0.05)
                        Performance has improved.

schur_decompose_10x10   time:   [7.4409 µs 7.4453 µs 7.4492 µs]
                        change: [+1.5363% +1.6395% +1.7360%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) low severe
  1 (1.00%) high mild
  1 (1.00%) high severe

schur_decompose_100x100 time:   [2.6115 ms 2.6172 ms 2.6243 ms]
                        change: [+1.6088% +1.8440% +2.1414%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

schur_decompose_200x200 time:   [18.406 ms 18.418 ms 18.432 ms]
                        change: [+0.9610% +1.1237% +1.2824%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

eigenvalues_4x4         time:   [852.29 ns 855.47 ns 858.45 ns]
                        change: [-33.645% -33.514% -33.373%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  18 (18.00%) high severe

eigenvalues_10x10       time:   [5.9802 µs 5.9817 µs 5.9835 µs]
                        change: [+0.4433% +0.5280% +0.5939%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

Benchmarking eigenvalues_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
eigenvalues_100x100     time:   [1.5907 ms 1.5927 ms 1.5955 ms]
                        change: [+0.4506% +0.5711% +0.6951%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

eigenvalues_200x200     time:   [11.141 ms 11.142 ms 11.144 ms]
                        change: [-0.1305% -0.0959% -0.0696%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

solve_l_triangular_100x100
                        time:   [1.0009 µs 1.0023 µs 1.0044 µs]
                        change: [-3.0545% -2.9536% -2.8531%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

solve_l_triangular_1000x1000
                        time:   [101.96 µs 102.10 µs 102.26 µs]
                        change: [+0.2874% +0.4251% +0.5544%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  16 (16.00%) high severe

tr_solve_l_triangular_100x100
                        time:   [1.7602 µs 1.7606 µs 1.7611 µs]
                        change: [+0.1457% +0.2378% +0.3238%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

tr_solve_l_triangular_1000x1000
                        time:   [95.588 µs 95.613 µs 95.638 µs]
                        change: [+0.6009% +0.8491% +1.0883%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

solve_u_triangular_100x100
                        time:   [1.1486 µs 1.1486 µs 1.1487 µs]
                        change: [-1.2041% -1.1651% -1.1380%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

solve_u_triangular_1000x1000
                        time:   [96.805 µs 96.827 µs 96.850 µs]
                        change: [-2.4840% -2.4423% -2.4081%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

tr_solve_u_triangular_100x100
                        time:   [1.1943 µs 1.1947 µs 1.1951 µs]
                        change: [-0.9912% -0.6228% -0.3160%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

tr_solve_u_triangular_1000x1000
                        time:   [86.848 µs 86.858 µs 86.868 µs]
                        change: [-1.4656% -1.4306% -1.4030%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

svd_decompose_2x2       time:   [24.714 ns 24.731 ns 24.757 ns]
                        change: [+16.359% +16.416% +16.476%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

svd_decompose_3x3       time:   [356.14 ns 356.26 ns 356.41 ns]
                        change: [+7.1080% +7.1682% +7.2242%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

svd_decompose_4x4       time:   [973.20 ns 973.36 ns 973.52 ns]
                        change: [-0.1503% -0.1038% -0.0509%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

svd_decompose_10x10     time:   [5.7955 µs 5.7969 µs 5.7982 µs]
                        change: [-1.2212% -1.1496% -1.0648%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

Benchmarking svd_decompose_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.8s, enable flat sampling, or reduce sample count to 50.
svd_decompose_100x100   time:   [1.5457 ms 1.5461 ms 1.5466 ms]
                        change: [-1.1800% -1.1312% -1.0869%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

svd_decompose_200x200   time:   [11.688 ms 11.698 ms 11.708 ms]
                        change: [-1.5525% -1.4404% -1.3260%] (p = 0.00 < 0.05)
                        Performance has improved.

rank_4x4                time:   [687.08 ns 687.31 ns 687.60 ns]
                        change: [-4.3382% -4.2619% -4.1730%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

rank_10x10              time:   [4.2263 µs 4.2314 µs 4.2359 µs]
                        change: [-0.0796% +0.0376% +0.1498%] (p = 0.53 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

rank_100x100            time:   [524.07 µs 525.08 µs 526.09 µs]
                        change: [+0.2055% +0.4232% +0.6185%] (p = 0.00 < 0.05)
                        Change within noise threshold.

rank_200x200            time:   [3.0034 ms 3.0049 ms 3.0066 ms]
                        change: [-0.0776% -0.0258% +0.0284%] (p = 0.40 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) low severe
  6 (6.00%) low mild
  2 (2.00%) high severe

singular_values_4x4     time:   [711.28 ns 711.46 ns 711.69 ns]
                        change: [-10.996% -10.943% -10.895%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

singular_values_10x10   time:   [4.3082 µs 4.3088 µs 4.3098 µs]
                        change: [+0.2477% +0.2828% +0.3239%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  6 (6.00%) high severe

singular_values_100x100 time:   [520.96 µs 521.13 µs 521.29 µs]
                        change: [-1.3767% -1.2379% -1.0940%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

singular_values_200x200 time:   [3.0055 ms 3.0063 ms 3.0075 ms]
                        change: [-0.0668% -0.0002% +0.0545%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

pseudo_inverse_4x4      time:   [767.27 ns 767.63 ns 768.12 ns]
                        change: [-20.375% -20.322% -20.267%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe

pseudo_inverse_10x10    time:   [6.1395 µs 6.1415 µs 6.1440 µs]
                        change: [+2.0662% +2.1284% +2.1866%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

Benchmarking pseudo_inverse_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
pseudo_inverse_100x100  time:   [1.6018 ms 1.6035 ms 1.6050 ms]
                        change: [-0.0386% +0.0840% +0.2218%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 23 outliers among 100 measurements (23.00%)
  13 (13.00%) low severe
  3 (3.00%) high mild
  7 (7.00%) high severe

pseudo_inverse_200x200  time:   [11.989 ms 11.997 ms 12.006 ms]
                        change: [-0.7602% -0.5368% -0.3351%] (p = 0.00 < 0.05)
                        Change within noise threshold.

symmetric_eigen_decompose_4x4
                        time:   [453.98 ns 454.22 ns 454.53 ns]
                        change: [-10.475% -10.350% -10.207%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  11 (11.00%) high severe

symmetric_eigen_decompose_10x10
                        time:   [3.6767 µs 3.6782 µs 3.6800 µs]
                        change: [-1.7404% -1.6869% -1.6361%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe

symmetric_eigen_decompose_100x100
                        time:   [767.36 µs 768.36 µs 769.56 µs]
                        change: [-7.0024% -6.9023% -6.7865%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

symmetric_eigen_decompose_200x200
                        time:   [5.2143 ms 5.2218 ms 5.2350 ms]
                        change: [-9.1662% -8.8429% -8.5229%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  6 (6.00%) high severe

Significant regression means that the computation of a resulting value was optimized out of the benchmarking loop previously.

im-0 · 2025-09-24T15:17:52Z

Changed bench_binop!() and bench_binop_ref!() to pass self as reference into black_box() instead of by value. This removed unnecessary copies in some benchmarks, but overall results are mostly the same.

geo-ant

This looks good to me overall, thanks for taking care of this. I just noted some questions in the code about the choice of what to black-box, but I'll be the first to admit that I don't have a lot of experience how to get the compiler not to optimize certain things out. I'd just feel better if you explained some of your rationale behind what exactly you black-boxed.

benches/common/macros.rs

benches/core/matrix.rs

geo-ant · 2025-09-27T19:31:14Z

benches/core/matrix.rs


-    bench.bench_function("mat8_mul_mat8", move |bh| bh.iter(|| &a * &b));
+    bench.bench_function("mat8_mul_mat8", move |bh| {
+        bh.iter(|| black_box(&a) * black_box(&b))


honest question, would the results have been different if you had black-boxed the product rather than the individual components?

Yes, there is a high chance that the benchmark result will be different.

bh.iter(|| &a * &b)

In this^ example, criterion calls || &a * &b closure multiple times to measure how much time it takes. But compiler is smart enough to notice that a and b cannot change within bh.iter(...), so it rewrites everything like this:

let optimized = &a * &b; bh.iter(|| optimized.clone())

What black_box() can do here is to make compiler think that black_box(x) produces completely random valid value of the same type as x. Of course, in a compiled binary it is a no-op and always produces just the value of x.

Wrapping the product with black_box() changes nothing here as compiler will still be able to see that arguments of mul() are not changing and thus it will be able to move mul() out of the loop:

let optimized = &a * &b; bh.iter(|| black_box(optimized.clone()))

Also, the return value of the closure already passed to black_box() inside Criterion's bh.iter(...) to ensure that call to a closure is not removed during optimization. Here black_box(x) has slightly different meaning - some unspecified computation that produces side effects based on the value of x (and thus value of x is important and it cannot be removed from compiled code entirely).

And, as we are interesting in measuring the performance of mul(), we have two options:

Generate proper random values for arguments of mul() on each iteration of bh.iter(...). This may be viable if mul() is slow enough to make random arg generation code appear insignificant in a total measured time. This option is not viable in general for nalgebra as a lot of benchmarks measure very fast operations that can be optimized down to just a few machine instructions (like Vector3 x Scalar multiplication etc.).

Disguise unchanged arguments of mul() as a random values on each iteration of bh.iter(...). This is exactly what I did here using black_box().

great explanation, thank you. I think, in principle I'd be fine merging this, I'm just wondering if you had considered refactoring to iter_batched. This is what I do in my projects and the criterion docs say

If your routine requires some per-iteration setup that shouldn’t be timed, use iter_batched or iter_batched_ref

which should be a way to supply new random matrices in every iteration of the benchmark. I don' think this will matter in these cases here, but in general this could help confusing the processor pipeline enough to get a more realistic measurement. However, I've also found this unresolved issue bheisler/criterion.rs#475 about measurement overhead in iter_batched, which I wasn't aware of before.

I don't want to make your life more complicated and I'm very grateful you're tackling this problem. I was just thinking we should really nail the benchmarks, since you also have some other cool things in the pipeline, which do depend on accurate benchmarks. What do you think?

To be honest, I wanted to keep this change minimal so I haven't considered to use other Criterion functions.

I tried to use iter_batched for mat2_mul_v (same benchmark that I used for demonstration in the original issue) right now:

diff --git a/benches/common/macros.rs b/benches/common/macros.rs index c3e12aaaef55..3521ef7ddc8c 100644 --- a/benches/common/macros.rs +++ b/benches/common/macros.rs @@ -4,15 +4,15 @@ macro_rules! bench_binop( ($name: ident, $t1: ty, $t2: ty, $binop: ident) => { fn $name(bh: &mut criterion::Criterion) { use rand::SeedableRng; - use std::hint::black_box; let mut rng = IsaacRng::seed_from_u64(0); - let a = rng.random::<$t1>(); - let b = rng.random::<$t2>(); - bh.bench_function(stringify!($name), move |bh| bh.iter(|| { - black_box(&a).$binop(black_box(b)) - })); + bh.bench_function(stringify!($name), move |bh| bh.iter_batched( + || (rng.random::<$t1>(), rng.random::<$t2>()), + |args| { + args.0.$binop(args.1) + }, + criterion::BatchSize::SmallInput)); } } );

This somehow improved performance vs. my current changes, but still regresses vs. current main:

I tried to check the generated assembly, but for iter_batched() it is much longer and I am not that good at reading assembly:
click for details...

nalgebra_bench-ccbcfb07ef18979a`criterion::bencher::Bencher$LT$M$GT$::iter_batched::heec5543a72133bcc: 0x55555562d840 <+0>: pushq %rbp 0x55555562d841 <+1>: pushq %r15 0x55555562d843 <+3>: pushq %r14 0x55555562d845 <+5>: pushq %r13 0x55555562d847 <+7>: pushq %r12 0x55555562d849 <+9>: pushq %rbx 0x55555562d84a <+10>: subq $0xa8, %rsp 0x55555562d851 <+17>: movb $0x1, 0x30(%rdi) 0x55555562d855 <+21>: movq 0x28(%rdi), %r15 0x55555562d859 <+25>: leaq 0x9(%r15), %rcx 0x55555562d85d <+29>: movabsq $-0x3333333333333333, %rdx ; imm = 0xCCCCCCCCCCCCCCCD 0x55555562d867 <+39>: movq %rcx, %rax 0x55555562d86a <+42>: mulq %rdx 0x55555562d86d <+45>: movq %rdx, 0x78(%rsp) 0x55555562d872 <+50>: cmpq $0x9, %rcx 0x55555562d876 <+54>: jbe 0x55555562dee6 ; <+1702> 0x55555562d87c <+60>: movq %rsi, %r14 0x55555562d87f <+63>: movq %rdi, %rbx 0x55555562d882 <+66>: movl $0x1, %edi 0x55555562d887 <+71>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb -> 0x55555562d88c <+76>: movq %rax, 0x88(%rsp) 0x55555562d894 <+84>: movl %edx, 0x74(%rsp) 0x55555562d898 <+88>: movq $0x0, (%rbx) 0x55555562d89f <+95>: movl $0x0, 0x8(%rbx) 0x55555562d8a6 <+102>: leaq -0x1(%r15), %rax 0x55555562d8aa <+106>: cmpq $0xa, %rax 0x55555562d8ae <+110>: movq %rbx, 0x80(%rsp) 0x55555562d8b6 <+118>: jae 0x55555562da0f ; <+463> 0x55555562d8bc <+124>: xorl %r12d, %r12d 0x55555562d8bf <+127>: xorl %ebx, %ebx 0x55555562d8c1 <+129>: jmp 0x55555562d921 ; <+225> 0x55555562d8c3 <+131>: nopw %cs:(%rax,%rax) 0x55555562d8d0 <+144>: addl $0xc4653600, %r12d ; imm = 0xC4653600 0x55555562d8d7 <+151>: incq %rbx 0x55555562d8da <+154>: movss 0x40(%rsp), %xmm1 0x55555562d8e0 <+160>: addss 0x20(%rsp), %xmm1 0x55555562d8e6 <+166>: movss 0x48(%rsp), %xmm0 0x55555562d8ec <+172>: addss 0x18(%rsp), %xmm0 0x55555562d8f2 <+178>: movd %xmm1, %eax 0x55555562d8f6 <+182>: movd %xmm0, %ecx 0x55555562d8fa <+186>: shlq $0x20, %rcx 0x55555562d8fe <+190>: orq %rcx, %rax 0x55555562d901 <+193>: movq %rbx, (%r13) 0x55555562d905 <+197>: movl %r12d, 0x8(%r13) 0x55555562d909 <+201>: movq %rax, (%rsp) 0x55555562d90d <+205>: movss (%rsp), %xmm0 0x55555562d912 <+210>: movss 0x4(%rsp), %xmm0 0x55555562d918 <+216>: decq %r15 0x55555562d91b <+219>: je 0x55555562de65 ; <+1573> 0x55555562d921 <+225>: movq %rsp, %rdi 0x55555562d924 <+228>: movq %r14, %rsi 0x55555562d927 <+231>: callq 0x5555556dc390 ; nalgebra_bench::core::matrix::mat2_mul_v::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h089cab30bae2dfb0 0x55555562d92c <+236>: movss 0x10(%rsp), %xmm0 0x55555562d932 <+242>: movss 0x14(%rsp), %xmm2 0x55555562d938 <+248>: movss (%rsp), %xmm1 0x55555562d93d <+253>: mulss %xmm0, %xmm1 0x55555562d941 <+257>: movss %xmm1, 0x40(%rsp) 0x55555562d947 <+263>: mulss 0x4(%rsp), %xmm0 0x55555562d94d <+269>: movss %xmm0, 0x48(%rsp) 0x55555562d953 <+275>: movss 0x8(%rsp), %xmm0 0x55555562d959 <+281>: mulss %xmm2, %xmm0 0x55555562d95d <+285>: movss %xmm0, 0x20(%rsp) 0x55555562d963 <+291>: mulss 0xc(%rsp), %xmm2 0x55555562d969 <+297>: movss %xmm2, 0x18(%rsp) 0x55555562d96f <+303>: movl $0x1, %edi 0x55555562d974 <+308>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb 0x55555562d979 <+313>: movq %rax, %r13 0x55555562d97c <+316>: movl %edx, %ebp 0x55555562d97e <+318>: movl $0x1, %edi 0x55555562d983 <+323>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb 0x55555562d988 <+328>: movq %rax, 0x50(%rsp) 0x55555562d98d <+333>: movl %edx, 0x58(%rsp) 0x55555562d991 <+337>: movq %r13, 0x28(%rsp) 0x55555562d996 <+342>: movl %ebp, 0x30(%rsp) 0x55555562d99a <+346>: movq %rsp, %rdi 0x55555562d99d <+349>: leaq 0x50(%rsp), %rsi 0x55555562d9a2 <+354>: leaq 0x28(%rsp), %rdx 0x55555562d9a7 <+359>: callq 0x55555593ac90 ; std::sys::pal::unix::time::Timespec::sub_timespec::hb206577083debcb5 0x55555562d9ac <+364>: movzbl (%rsp), %eax 0x55555562d9b0 <+368>: testb %al, %al 0x55555562d9b2 <+370>: jne 0x55555562d9c0 ; <+384> 0x55555562d9b4 <+372>: movq 0x8(%rsp), %rcx 0x55555562d9b9 <+377>: jmp 0x55555562d9c2 ; <+386> 0x55555562d9bb <+379>: nopl (%rax,%rax) 0x55555562d9c0 <+384>: xorl %ecx, %ecx 0x55555562d9c2 <+386>: addq %rcx, %rbx 0x55555562d9c5 <+389>: movq 0x80(%rsp), %r13 0x55555562d9cd <+397>: jb 0x55555562d9f7 ; <+439> 0x55555562d9cf <+399>: testb $0x1, %al 0x55555562d9d1 <+401>: movl 0x10(%rsp), %eax 0x55555562d9d5 <+405>: movl $0x0, %ecx 0x55555562d9da <+410>: cmovnel %ecx, %eax 0x55555562d9dd <+413>: addl %eax, %r12d 0x55555562d9e0 <+416>: cmpl $0x3b9aca00, %r12d ; imm = 0x3B9ACA00 0x55555562d9e7 <+423>: jb 0x55555562d8da ; <+154> 0x55555562d9ed <+429>: cmpq $-0x1, %rbx 0x55555562d9f1 <+433>: jne 0x55555562d8d0 ; <+144> 0x55555562d9f7 <+439>: leaq 0x3867bc(%rip), %rdi 0x55555562d9fe <+446>: leaq 0x4456db(%rip), %rdx ; __dso_handle + 29352 0x55555562da05 <+453>: movl $0x1e, %esi 0x55555562da0a <+458>: callq 0x5555555b1fd0 ; core::option::expect_failed::h50b71e74d7945a60 0x55555562da0f <+463>: shrq $0x3, 0x78(%rsp) 0x55555562da15 <+469>: movabsq $0xffffffffffffffe, %rax ; imm = 0xFFFFFFFFFFFFFFE 0x55555562da1f <+479>: movq $0x0, 0x60(%rsp) 0x55555562da28 <+488>: incq %rax 0x55555562da2b <+491>: movq %rax, 0x90(%rsp) 0x55555562da33 <+499>: xorl %esi, %esi 0x55555562da35 <+501>: jmp 0x55555562da55 ; <+533> 0x55555562da37 <+503>: nopw (%rax,%rax) 0x55555562da40 <+512>: movq 0x40(%rsp), %rsi 0x55555562da45 <+517>: addq %rbx, %rsi 0x55555562da48 <+520>: movq 0x28(%r13), %r15 0x55555562da4c <+524>: cmpq %r15, %rsi 0x55555562da4f <+527>: jae 0x55555562de65 ; <+1573> 0x55555562da55 <+533>: movq %r15, %r13 0x55555562da58 <+536>: subq %rsi, %r13 0x55555562da5b <+539>: movq 0x78(%rsp), %rax 0x55555562da60 <+544>: cmpq %rax, %r13 0x55555562da63 <+547>: cmovaeq %rax, %r13 0x55555562da67 <+551>: movq %r13, %rax 0x55555562da6a <+554>: movl $0x18, %ecx 0x55555562da6f <+559>: mulq %rcx 0x55555562da72 <+562>: jo 0x55555562defe ; <+1726> 0x55555562da78 <+568>: movabsq $0x7ffffffffffffffd, %rcx ; imm = 0x7FFFFFFFFFFFFFFD 0x55555562da82 <+578>: cmpq %rcx, %rax 0x55555562da85 <+581>: jae 0x55555562defe ; <+1726> 0x55555562da8b <+587>: movq %rsi, 0x40(%rsp) 0x55555562da90 <+592>: testq %rax, %rax 0x55555562da93 <+595>: movq %r15, 0x68(%rsp) 0x55555562da98 <+600>: je 0x55555562dac0 ; <+640> 0x55555562da9a <+602>: movq %rax, %r12 0x55555562da9d <+605>: movq %rax, %rdi 0x55555562daa0 <+608>: callq *0x46d20a(%rip) ; _GLOBAL_OFFSET_TABLE_ + 320 0x55555562daa6 <+614>: movq %rax, %rbx 0x55555562daa9 <+617>: movq %r13, %rbp 0x55555562daac <+620>: testq %rax, %rax 0x55555562daaf <+623>: jne 0x55555562dac7 ; <+647> 0x55555562dab1 <+625>: jmp 0x55555562df2a ; <+1770> 0x55555562dab6 <+630>: nopw %cs:(%rax,%rax) 0x55555562dac0 <+640>: movl $0x4, %ebx 0x55555562dac5 <+645>: xorl %ebp, %ebp 0x55555562dac7 <+647>: movq %rbx, %r12 0x55555562daca <+650>: movq %r13, 0x48(%rsp) 0x55555562dacf <+655>: movq %rsp, %r15 0x55555562dad2 <+658>: nopw %cs:(%rax,%rax) 0x55555562dae0 <+672>: movq %r15, %rdi 0x55555562dae3 <+675>: movq %r14, %rsi 0x55555562dae6 <+678>: callq 0x5555556dc390 ; nalgebra_bench::core::matrix::mat2_mul_v::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h089cab30bae2dfb0 0x55555562daeb <+683>: movq 0x10(%rsp), %rax 0x55555562daf0 <+688>: movq %rax, 0x10(%r12) 0x55555562daf5 <+693>: movups (%rsp), %xmm0 0x55555562daf9 <+697>: movups %xmm0, (%r12) 0x55555562dafe <+702>: addq $0x18, %r12 0x55555562db02 <+706>: decq %r13 0x55555562db05 <+709>: jne 0x55555562dae0 ; <+672> 0x55555562db07 <+711>: movq %rbp, (%rsp) 0x55555562db0b <+715>: movq %rbx, 0x8(%rsp) 0x55555562db10 <+720>: movq 0x48(%rsp), %r15 0x55555562db15 <+725>: movq %r15, 0x10(%rsp) 0x55555562db1a <+730>: movq (%rsp), %rax 0x55555562db1e <+734>: movq %rax, 0x20(%rsp) 0x55555562db23 <+739>: movq 0x8(%rsp), %rax 0x55555562db28 <+744>: movq %rax, 0x18(%rsp) 0x55555562db2d <+749>: movq 0x10(%rsp), %rbx 0x55555562db32 <+754>: leaq (,%r15,8), %r12 0x55555562db3a <+762>: cmpq 0x90(%rsp), %r15 0x55555562db42 <+770>: ja 0x55555562df14 ; <+1748> 0x55555562db48 <+776>: movq 0x68(%rsp), %rax 0x55555562db4d <+781>: cmpq 0x40(%rsp), %rax 0x55555562db52 <+786>: jne 0x55555562db60 ; <+800> 0x55555562db54 <+788>: movl $0x4, %ebp 0x55555562db59 <+793>: xorl %r15d, %r15d 0x55555562db5c <+796>: jmp 0x55555562dbb0 ; <+880> 0x55555562db5e <+798>: nop 0x55555562db60 <+800>: testq %r15, %r15 0x55555562db63 <+803>: je 0x55555562db7b ; <+827> 0x55555562db65 <+805>: movq %r12, %rdi 0x55555562db68 <+808>: callq *0x46d142(%rip) ; _GLOBAL_OFFSET_TABLE_ + 320 0x55555562db6e <+814>: movq %rax, %rbp 0x55555562db71 <+817>: testq %rbp, %rbp 0x55555562db74 <+820>: jne 0x55555562dbb0 ; <+880> 0x55555562db76 <+822>: jmp 0x55555562df0a ; <+1738> 0x55555562db7b <+827>: movq $0x0, (%rsp) 0x55555562db83 <+835>: movl $0x8, %esi 0x55555562db88 <+840>: movq %rsp, %rdi 0x55555562db8b <+843>: movq %r12, %rdx 0x55555562db8e <+846>: callq *0x46d064(%rip) ; _GLOBAL_OFFSET_TABLE_ + 136 0x55555562db94 <+852>: testl %eax, %eax 0x55555562db96 <+854>: jne 0x55555562df0a ; <+1738> 0x55555562db9c <+860>: movq (%rsp), %rbp 0x55555562dba0 <+864>: testq %rbp, %rbp 0x55555562dba3 <+867>: je 0x55555562df0a ; <+1738> 0x55555562dba9 <+873>: nopl (%rax) 0x55555562dbb0 <+880>: movq %r15, 0x28(%rsp) 0x55555562dbb5 <+885>: movq %rbp, 0x30(%rsp) 0x55555562dbba <+890>: movq $0x0, 0x38(%rsp) 0x55555562dbc3 <+899>: movl $0x1, %edi 0x55555562dbc8 <+904>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb 0x55555562dbcd <+909>: movl %edx, 0x68(%rsp) 0x55555562dbd1 <+913>: movq %rax, %r12 0x55555562dbd4 <+916>: cmpq %r15, %rbx 0x55555562dbd7 <+919>: ja 0x55555562de37 ; <+1527> 0x55555562dbdd <+925>: movl $0x0, %esi 0x55555562dbe2 <+930>: testq %rbx, %rbx 0x55555562dbe5 <+933>: movq 0x18(%rsp), %r15 0x55555562dbea <+938>: je 0x55555562dd4d ; <+1293> 0x55555562dbf0 <+944>: leaq (%rbx,%rbx,2), %rdi 0x55555562dbf4 <+948>: leaq -0x18(,%rdi,8), %rcx 0x55555562dbfc <+956>: movq %rcx, %rax 0x55555562dbff <+959>: movabsq $-0x5555555555555555, %rdx ; imm = 0xAAAAAAAAAAAAAAAB 0x55555562dc09 <+969>: mulq %rdx 0x55555562dc0c <+972>: cmpq $0x5f, %rcx 0x55555562dc10 <+976>: jbe 0x55555562dc46 ; <+1030> 0x55555562dc12 <+978>: shrq $0x4, %rdx 0x55555562dc16 <+982>: leaq (,%rsi,8), %rcx 0x55555562dc1e <+990>: addq %rbp, %rcx 0x55555562dc21 <+993>: leaq (%rdx,%rdx,2), %rax 0x55555562dc25 <+997>: leaq (%r15,%rax,8), %rax 0x55555562dc29 <+1001>: addq $0x18, %rax 0x55555562dc2d <+1005>: cmpq %rax, %rcx 0x55555562dc30 <+1008>: jae 0x55555562dc4e ; <+1038> 0x55555562dc32 <+1010>: leaq (%rsi,%rdx), %rax 0x55555562dc36 <+1014>: leaq 0x8(,%rax,8), %rax 0x55555562dc3e <+1022>: addq %rbp, %rax 0x55555562dc41 <+1025>: cmpq %rax, %r15 0x55555562dc44 <+1028>: jae 0x55555562dc4e ; <+1038> 0x55555562dc46 <+1030>: movq %r15, %rax 0x55555562dc49 <+1033>: jmp 0x55555562dcf0 ; <+1200> 0x55555562dc4e <+1038>: movabsq $0xffffffffffffffe, %rax ; imm = 0xFFFFFFFFFFFFFFE 0x55555562dc58 <+1048>: andq %rax, %rdx 0x55555562dc5b <+1051>: leaq (,%rdx,8), %rax 0x55555562dc63 <+1059>: leaq (%rax,%rax,2), %rax 0x55555562dc67 <+1063>: movq %r15, %r8 0x55555562dc6a <+1066>: xorl %r9d, %r9d 0x55555562dc6d <+1069>: nopl (%rax) 0x55555562dc70 <+1072>: movupd (%r8), %xmm1 0x55555562dc75 <+1077>: movupd 0x10(%r8), %xmm2 0x55555562dc7b <+1083>: movupd 0x20(%r8), %xmm3 0x55555562dc81 <+1089>: movapd %xmm2, %xmm4 0x55555562dc85 <+1093>: movapd %xmm1, %xmm0 0x55555562dc89 <+1097>: movsd %xmm3, %xmm0 ; xmm0 = xmm3[0],xmm0[1] 0x55555562dc8d <+1101>: movapd %xmm3, %xmm5 0x55555562dc91 <+1105>: movsd %xmm2, %xmm3 ; xmm3 = xmm2[0],xmm3[1] 0x55555562dc95 <+1109>: shufps $0x2, %xmm1, %xmm2 ; xmm2 = xmm2[2,0],xmm1[0,0] 0x55555562dc99 <+1113>: shufps $0xe2, %xmm1, %xmm2 ; xmm2 = xmm2[2,0],xmm1[2,3] 0x55555562dc9d <+1117>: shufps $0x13, %xmm1, %xmm4 ; xmm4 = xmm4[3,0],xmm1[1,0] 0x55555562dca1 <+1121>: shufps $0xe2, %xmm1, %xmm4 ; xmm4 = xmm4[2,0],xmm1[2,3] 0x55555562dca5 <+1125>: shufps $0xe2, %xmm1, %xmm0 ; xmm0 = xmm0[2,0],xmm1[2,3] 0x55555562dca9 <+1129>: shufps $0x31, %xmm1, %xmm5 ; xmm5 = xmm5[1,0],xmm1[3,0] 0x55555562dcad <+1133>: shufps $0xe2, %xmm1, %xmm5 ; xmm5 = xmm5[2,0],xmm1[2,3] 0x55555562dcb1 <+1137>: movapd %xmm3, %xmm1 0x55555562dcb5 <+1141>: shufps $0xe8, %xmm3, %xmm1 ; xmm1 = xmm1[0,2],xmm3[2,3] 0x55555562dcb9 <+1145>: psrlq $0x20, %xmm3 0x55555562dcbe <+1150>: pshufd $0xe8, %xmm3, %xmm3 ; xmm3 = xmm3[0,2,2,3] 0x55555562dcc3 <+1155>: mulps %xmm1, %xmm2 0x55555562dcc6 <+1158>: mulps %xmm4, %xmm1 0x55555562dcc9 <+1161>: mulps %xmm3, %xmm0 0x55555562dccc <+1164>: addps %xmm2, %xmm0 0x55555562dccf <+1167>: mulps %xmm3, %xmm5 0x55555562dcd2 <+1170>: addps %xmm1, %xmm5 0x55555562dcd5 <+1173>: unpcklps %xmm5, %xmm0 ; xmm0 = xmm0[0],xmm5[0],xmm0[1],xmm5[1] 0x55555562dcd8 <+1176>: movups %xmm0, (%rcx,%r9,8) 0x55555562dcdd <+1181>: addq $0x2, %r9 0x55555562dce1 <+1185>: addq $0x30, %r8 0x55555562dce5 <+1189>: cmpq %r9, %rdx 0x55555562dce8 <+1192>: jne 0x55555562dc70 ; <+1072> 0x55555562dcea <+1194>: addq %rdx, %rsi 0x55555562dced <+1197>: addq %r15, %rax 0x55555562dcf0 <+1200>: leaq (%r15,%rdi,8), %rcx 0x55555562dcf4 <+1204>: nopw %cs:(%rax,%rax) 0x55555562dd00 <+1216>: movss 0x10(%rax), %xmm0 0x55555562dd05 <+1221>: movss 0x14(%rax), %xmm1 0x55555562dd0a <+1226>: movss (%rax), %xmm2 0x55555562dd0e <+1230>: mulss %xmm0, %xmm2 0x55555562dd12 <+1234>: mulss 0x4(%rax), %xmm0 0x55555562dd17 <+1239>: movss 0x8(%rax), %xmm3 0x55555562dd1c <+1244>: mulss %xmm1, %xmm3 0x55555562dd20 <+1248>: addss %xmm2, %xmm3 0x55555562dd24 <+1252>: mulss 0xc(%rax), %xmm1 0x55555562dd29 <+1257>: addss %xmm0, %xmm1 0x55555562dd2d <+1261>: movd %xmm3, %edx 0x55555562dd31 <+1265>: movd %xmm1, %edi 0x55555562dd35 <+1269>: shlq $0x20, %rdi 0x55555562dd39 <+1273>: orq %rdi, %rdx 0x55555562dd3c <+1276>: movq %rdx, (%rbp,%rsi,8) 0x55555562dd41 <+1281>: incq %rsi 0x55555562dd44 <+1284>: addq $0x18, %rax 0x55555562dd48 <+1288>: cmpq %rcx, %rax 0x55555562dd4b <+1291>: jne 0x55555562dd00 ; <+1216> 0x55555562dd4d <+1293>: movq %rsi, 0x38(%rsp) 0x55555562dd52 <+1298>: cmpq $0x0, 0x20(%rsp) 0x55555562dd58 <+1304>: je 0x55555562dd63 ; <+1315> 0x55555562dd5a <+1306>: movq %r15, %rdi 0x55555562dd5d <+1309>: callq *0x46cfdd(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464 0x55555562dd63 <+1315>: movl $0x1, %edi 0x55555562dd68 <+1320>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb 0x55555562dd6d <+1325>: movq 0x80(%rsp), %r13 0x55555562dd75 <+1333>: movq %rsp, %rdi 0x55555562dd78 <+1336>: movq %rax, 0x98(%rsp) 0x55555562dd80 <+1344>: movl %edx, 0xa0(%rsp) 0x55555562dd87 <+1351>: movq %r12, 0x50(%rsp) 0x55555562dd8c <+1356>: movl 0x68(%rsp), %eax 0x55555562dd90 <+1360>: movl %eax, 0x58(%rsp) 0x55555562dd94 <+1364>: leaq 0x98(%rsp), %rsi 0x55555562dd9c <+1372>: leaq 0x50(%rsp), %rdx 0x55555562dda1 <+1377>: callq 0x55555593ac90 ; std::sys::pal::unix::time::Timespec::sub_timespec::hb206577083debcb5 0x55555562dda6 <+1382>: movzbl (%rsp), %ecx 0x55555562ddaa <+1386>: testb %cl, %cl 0x55555562ddac <+1388>: movq 0x48(%rsp), %rbx 0x55555562ddb1 <+1393>: jne 0x55555562ddc0 ; <+1408> 0x55555562ddb3 <+1395>: movq 0x8(%rsp), %rax 0x55555562ddb8 <+1400>: jmp 0x55555562ddc2 ; <+1410> 0x55555562ddba <+1402>: nopw (%rax,%rax) 0x55555562ddc0 <+1408>: xorl %eax, %eax 0x55555562ddc2 <+1410>: addq (%r13), %rax 0x55555562ddc6 <+1414>: jb 0x55555562decc ; <+1676> 0x55555562ddcc <+1420>: testb $0x1, %cl 0x55555562ddcf <+1423>: movl 0x10(%rsp), %ecx 0x55555562ddd3 <+1427>: movl $0x0, %edx 0x55555562ddd8 <+1432>: cmovnel %edx, %ecx 0x55555562dddb <+1435>: addl 0x8(%r13), %ecx 0x55555562dddf <+1439>: cmpl $0x3b9aca00, %ecx ; imm = 0x3B9ACA00 0x55555562dde5 <+1445>: jb 0x55555562ddfa ; <+1466> 0x55555562dde7 <+1447>: cmpq $-0x1, %rax 0x55555562ddeb <+1451>: je 0x55555562decc ; <+1676> 0x55555562ddf1 <+1457>: addl $0xc4653600, %ecx ; imm = 0xC4653600 0x55555562ddf7 <+1463>: incq %rax 0x55555562ddfa <+1466>: movq %rax, (%r13) 0x55555562ddfe <+1470>: movl %ecx, 0x8(%r13) 0x55555562de02 <+1474>: movq 0x38(%rsp), %rax 0x55555562de07 <+1479>: movq %rax, 0x10(%rsp) 0x55555562de0c <+1484>: movups 0x28(%rsp), %xmm0 0x55555562de11 <+1489>: movaps %xmm0, (%rsp) 0x55555562de15 <+1493>: movq 0x10(%rsp), %rax 0x55555562de1a <+1498>: movq (%rsp), %rax 0x55555562de1e <+1502>: movq 0x8(%rsp), %rdi 0x55555562de23 <+1507>: testq %rax, %rax 0x55555562de26 <+1510>: je 0x55555562da40 ; <+512> 0x55555562de2c <+1516>: callq *0x46cf0e(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464 0x55555562de32 <+1522>: jmp 0x55555562da40 ; <+512> 0x55555562de37 <+1527>: movl $0x4, %ecx 0x55555562de3c <+1532>: movl $0x8, %r8d 0x55555562de42 <+1538>: leaq 0x28(%rsp), %rdi 0x55555562de47 <+1543>: xorl %esi, %esi 0x55555562de49 <+1545>: movq %rbx, %rdx 0x55555562de4c <+1548>: movq 0x18(%rsp), %r15 0x55555562de51 <+1553>: callq 0x555555555550 ; alloc::raw_vec::RawVecInner$LT$A$GT$::reserve::do_reserve_and_handle::h6eaae75860de7206 0x55555562de56 <+1558>: movq 0x30(%rsp), %rbp 0x55555562de5b <+1563>: movq 0x38(%rsp), %rsi 0x55555562de60 <+1568>: jmp 0x55555562dbf0 ; <+944> 0x55555562de65 <+1573>: movl $0x1, %edi 0x55555562de6a <+1578>: callq 0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb 0x55555562de6f <+1583>: movq %rax, 0x50(%rsp) 0x55555562de74 <+1588>: movl %edx, 0x58(%rsp) 0x55555562de78 <+1592>: movq 0x88(%rsp), %rax 0x55555562de80 <+1600>: movq %rax, 0x28(%rsp) 0x55555562de85 <+1605>: movl 0x74(%rsp), %eax 0x55555562de89 <+1609>: movl %eax, 0x30(%rsp) 0x55555562de8d <+1613>: movq %rsp, %rdi 0x55555562de90 <+1616>: leaq 0x50(%rsp), %rsi 0x55555562de95 <+1621>: leaq 0x28(%rsp), %rdx 0x55555562de9a <+1626>: callq 0x55555593ac90 ; std::sys::pal::unix::time::Timespec::sub_timespec::hb206577083debcb5 0x55555562de9f <+1631>: xorl %eax, %eax 0x55555562dea1 <+1633>: cmpb $0x0, (%rsp) 0x55555562dea5 <+1637>: movl 0x10(%rsp), %ecx 0x55555562dea9 <+1641>: cmovnel %eax, %ecx 0x55555562deac <+1644>: cmoveq 0x8(%rsp), %rax 0x55555562deb2 <+1650>: movq %rax, 0x10(%r13) 0x55555562deb6 <+1654>: movl %ecx, 0x18(%r13) 0x55555562deba <+1658>: addq $0xa8, %rsp 0x55555562dec1 <+1665>: popq %rbx 0x55555562dec2 <+1666>: popq %r12 0x55555562dec4 <+1668>: popq %r13 0x55555562dec6 <+1670>: popq %r14 0x55555562dec8 <+1672>: popq %r15 0x55555562deca <+1674>: popq %rbp 0x55555562decb <+1675>: retq 0x55555562decc <+1676>: leaq 0x3862e7(%rip), %rdi 0x55555562ded3 <+1683>: leaq 0x445206(%rip), %rdx ; __dso_handle + 29352 0x55555562deda <+1690>: movl $0x1e, %esi 0x55555562dedf <+1695>: callq 0x5555555b1fd0 ; core::option::expect_failed::h50b71e74d7945a60 0x55555562dee4 <+1700>: jmp 0x55555562df28 ; <+1768> 0x55555562dee6 <+1702>: leaq 0x379c5e(%rip), %rdi 0x55555562deed <+1709>: leaq 0x43f824(%rip), %rdx ; __dso_handle + 6368 0x55555562def4 <+1716>: movl $0x1c, %esi 0x55555562def9 <+1721>: callq 0x5555555c491e ; std::panicking::begin_panic::h4f2cc586c820a72c 0x55555562defe <+1726>: leaq 0x46c7fb(%rip), %rdi ; __dso_handle + 190664 0x55555562df05 <+1733>: callq 0x5555555aedb0 ; alloc::raw_vec::capacity_overflow::h46cadc9fcf0d8ebe 0x55555562df0a <+1738>: movl $0x4, %eax 0x55555562df0f <+1743>: movq %rax, 0x60(%rsp) 0x55555562df14 <+1748>: leaq 0x43f815(%rip), %rdx ; __dso_handle + 6392 0x55555562df1b <+1755>: movq 0x60(%rsp), %rdi 0x55555562df20 <+1760>: movq %r12, %rsi 0x55555562df23 <+1763>: callq 0x5555555aed83 ; alloc::raw_vec::handle_error::hc389833aee8d6f48 0x55555562df28 <+1768>: ud2 0x55555562df2a <+1770>: movl $0x4, %edi 0x55555562df2f <+1775>: movq %r12, %rsi 0x55555562df32 <+1778>: callq 0x5555555aed99 ; alloc::alloc::handle_alloc_error::h9164725ce4591dac 0x55555562df37 <+1783>: movq %rax, %rbx 0x55555562df3a <+1786>: cmpq $0x0, 0x20(%rsp) 0x55555562df40 <+1792>: jne 0x55555562df4d ; <+1805> 0x55555562df42 <+1794>: movq $0x0, 0x20(%rsp) 0x55555562df4b <+1803>: jmp 0x55555562df71 ; <+1841> 0x55555562df4d <+1805>: movq 0x18(%rsp), %rdi 0x55555562df52 <+1810>: callq *0x46cde8(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464 0x55555562df58 <+1816>: jmp 0x55555562df71 ; <+1841> 0x55555562df5a <+1818>: movq %rax, %rbx 0x55555562df5d <+1821>: movb $0x1, %bpl 0x55555562df60 <+1824>: jmp 0x55555562df78 ; <+1848> 0x55555562df62 <+1826>: movq %rax, %rbx 0x55555562df65 <+1829>: movq 0x18(%rsp), %rdi 0x55555562df6a <+1834>: jmp 0x55555562df92 ; <+1874> 0x55555562df6c <+1836>: jmp 0x55555562df6e ; <+1838> 0x55555562df6e <+1838>: movq %rax, %rbx 0x55555562df71 <+1841>: movq 0x28(%rsp), %r15 0x55555562df76 <+1846>: xorl %ebp, %ebp 0x55555562df78 <+1848>: testq %r15, %r15 0x55555562df7b <+1851>: je 0x55555562df88 ; <+1864> 0x55555562df7d <+1853>: movq 0x30(%rsp), %rdi 0x55555562df82 <+1858>: callq *0x46cdb8(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464 0x55555562df88 <+1864>: testb %bpl, %bpl 0x55555562df8b <+1867>: movq 0x18(%rsp), %rdi 0x55555562df90 <+1872>: je 0x55555562df9a ; <+1882> 0x55555562df92 <+1874>: cmpq $0x0, 0x20(%rsp) 0x55555562df98 <+1880>: jne 0x55555562dfa2 ; <+1890> 0x55555562df9a <+1882>: movq %rbx, %rdi 0x55555562df9d <+1885>: callq 0x5555555543b0 ; symbol stub for: _Unwind_Resume 0x55555562dfa2 <+1890>: callq *0x46cd98(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464 0x55555562dfa8 <+1896>: movq %rbx, %rdi 0x55555562dfab <+1899>: callq 0x5555555543b0 ; symbol stub for: _Unwind_Resume

I am not sure, but I think that compiler was able to autovectorize this to process two(?) mul(a, b) calls per iteration, see code starting at 0x55555562dc70. I do not have time for this right now, but will be able to return to this later today or at the beginning of the week.

In general, I think that it should be easy to modify existing macros to use iter_batched() and iter_batched_ref(). I really do not want to do this manually for the rest of the code, but may try to task LLM with this 😁.

Ah, and also this poses another question: do we want to always generate random values each time for both arguments of binary operations to simulate a worst case scenario? Or we also need to add another macro that generates a lot of random values for a second argument but uses a reference to single first argument (self)? In some cases in practice you need to multiply like a ton of vectors by a single matrix. And there might be a difference in performance due to a cache misses, for example.

I also checked the implementation of iter_batched*() and it seems that it does the right thing and the return value of a setup closure is wrapped in a black_box().

Regarding the Criterion issue you mentioned: I am not sure, but I suspect that the problem is that they are measuring the time it takes to deallocate a vector on drop(). Performance of free() may depend on an allocated size because allocators sometimes use different algorithms for different allocation sizes.

And a last thought for now: can we bump criterion to a latest version as a part of this PR?

And a last thought for now: can we bump criterion to a latest version as a part of this PR?

I think bumping to the last criterion version is a very good idea, provided everything else still works.

Ah, and also this poses another question: do we want to always generate random values each time for both arguments of binary operations to simulate a worst case scenario? Or we also need to add another macro that generates a lot of random values for a second argument but uses a reference to single first argument (self)? In some cases in practice you need to multiply like a ton of vectors by a single matrix. And there might be a difference in performance due to a cache misses, for example.

Very good question and I don't think I have a great answer. Maybe your provided implementation is better after all? It corrects the original code but keeps the same spirit, i.e. if we have sufficiently small pieces of data, we'll take advantage of caching... I don't know... microbenchmarks sure are great aren't they 😆

In general, I think that it should be easy to modify existing macros to use iter_batched() and iter_batched_ref(). I really do not want to do this manually for the rest of the code, but may try to task LLM with this 😁.

Given the discussion above, I don't know if you would want to try the refactor at all. If you end up attempting it and it is too tedious for you to refactor (or you don't have access to one of our future AI overloads), let me know.

Redid almost everything in this PR. Here is the summary of all changes relative to current main:

Set codegen-units = 1 for benchmarks. I found that codegen-units with default value leads to inconsistent results across recompilations (clean vs. incremental). Also, sometimes it leads to a significant performance degradation of benchmarks unrelated to code changes. See 4000% performance regression with "-C target-cpu=x86-64-v3" and fat LTO rust-lang/rust#146497 for details.

criterion updated to version 0.7.

Unused macros removed (I found another unused macro!)

Remaining macros changed to use iter_batched() and iter_batched_ref().

Added macros to benchmark Single x N Values binary operations. This simulates real-world use cases like multiplication of many vectors by a single matrix.

There is a ~2x performance difference between a case when both arguments are random on each iteration and a case when one argument is static and second is random on each iteration:
click for details...

mat2_mul_v time: [778.33 ps 785.41 ps 797.70 ps] Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) low severe 4 (4.00%) high mild 5 (5.00%) high severe mat3_mul_v time: [1.7001 ns 1.7051 ns 1.7111 ns] Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low severe 1 (1.00%) low mild 8 (8.00%) high mild 1 (1.00%) high severe mat4_mul_v time: [2.6101 ns 2.6223 ns 2.6374 ns] Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe single_mat2_mul_v time: [402.65 ps 403.62 ps 404.75 ps] Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low mild 5 (5.00%) high mild 3 (3.00%) high severe single_mat3_mul_v time: [651.30 ps 654.06 ps 657.15 ps] Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) low mild 8 (8.00%) high mild 4 (4.00%) high severe single_mat4_mul_v time: [1.0628 ns 1.0645 ns 1.0666 ns] Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat2_tr_mul_v time: [719.81 ps 721.99 ps 724.59 ps] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 5 (5.00%) high mild mat3_tr_mul_v time: [1.6685 ns 1.6758 ns 1.6841 ns] Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat4_tr_mul_v time: [2.6739 ns 2.6897 ns 2.7080 ns] Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 8 (8.00%) high severe single_mat2_tr_mul_v time: [353.36 ps 354.56 ps 356.03 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe single_mat3_tr_mul_v time: [779.82 ps 782.84 ps 786.37 ps] Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild 2 (2.00%) high severe single_mat4_tr_mul_v time: [1.1918 ns 1.1946 ns 1.1977 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe unit_quaternion_mul_v time: [1.5002 ns 1.5088 ns 1.5183 ns] change: [−0.0578% +0.3775% +0.8498%] (p = 0.10 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe single_unit_quaternion_mul_v time: [1.0489 ns 1.0531 ns 1.0584 ns] Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe

Uncommented some quaternion benchmarks. I do not know why those benchmarks were commented out in the first place.

Remaining non-macro benchmarks changed to use iter_batched() and iter_batched_ref().

The bulk of the changes was done by Claude Sonnet 4. Additionally I moved DVector allocations outside of the benchmarks, and added anything allocated and not consumed into a return tuple of a benchmark closure to ensure that implicit drop/free is not included into the measured time.

Added reproducible_smatrix(). Some algorithms may not converge when used on completely random values with the default value of epsilon and unlimited iterations. reproducible_dmatrix() already exist to circumvent this for DMatrix, so I implemented the same for SMatrix.

In my tests this problem manifested itself only on schur_decompose_4x4, but I decided to apply similar fix for all benchmarks that also use reproducible_dmatrix() for DMatrix.

Cholesky decomposition benchmarks changed to use reproducible_dmatrix().

Random matrices may be not positive-definite and Cholesky decomposition benchmarks panic because of that:

Benchmarking cholesky_decompose_unpack_100x100: Warming up for 3.0000 s thread 'main' panicked at benches/linalg/cholesky.rs:38:45: called `Option::unwrap()` on a `None` value

Total run time of full benchmark suite on my machine (AMD 5950X) has not changed and is still around ~30 minutes. Here are the results with difference from current main:
click for details...

mat2_mul_m time: [1.1043 ns 1.1058 ns 1.1077 ns] change: [+49.306% +49.651% +50.045%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) low severe 2 (2.00%) high mild 6 (6.00%) high severe mat3_mul_m time: [3.1885 ns 3.1945 ns 3.2038 ns] change: [+102.62% +103.63% +104.86%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe mat4_mul_m time: [6.7759 ns 6.7840 ns 6.7929 ns] change: [+130.65% +131.50% +132.59%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe mat2_tr_mul_m time: [1.2882 ns 1.2901 ns 1.2926 ns] change: [+75.005% +75.472% +75.928%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat3_tr_mul_m time: [3.1688 ns 3.1725 ns 3.1770 ns] change: [+101.61% +102.10% +102.66%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 4 (4.00%) high mild 4 (4.00%) high severe mat4_tr_mul_m time: [6.5406 ns 6.5453 ns 6.5508 ns] change: [+121.95% +122.66% +123.42%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 6 (6.00%) high severe mat2_add_m time: [644.68 ps 645.88 ps 647.24 ps] change: [−13.049% −12.530% −11.972%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe mat3_add_m time: [1.3543 ns 1.3572 ns 1.3607 ns] change: [−14.707% −13.705% −12.403%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 6 (6.00%) low severe 5 (5.00%) high mild 4 (4.00%) high severe mat4_add_m time: [2.3987 ns 2.4015 ns 2.4044 ns] change: [−20.676% −19.615% −18.453%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) low severe 5 (5.00%) high mild 3 (3.00%) high severe mat2_sub_m time: [637.47 ps 638.88 ps 640.62 ps] change: [−13.604% −13.020% −12.333%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe mat3_sub_m time: [1.3531 ns 1.3546 ns 1.3562 ns] change: [−15.139% −14.610% −14.084%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild 4 (4.00%) high severe mat4_sub_m time: [2.3972 ns 2.3996 ns 2.4021 ns] change: [−20.412% −19.249% −18.330%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat2_mul_v time: [774.43 ps 775.48 ps 776.73 ps] change: [+144.90% +145.51% +146.12%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 5 (5.00%) high mild 3 (3.00%) high severe mat3_mul_v time: [1.6843 ns 1.6858 ns 1.6874 ns] change: [+284.57% +285.82% +287.43%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat4_mul_v time: [2.6029 ns 2.6196 ns 2.6485 ns] change: [+255.34% +257.62% +261.68%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe single_mat2_mul_v time: [392.29 ps 393.45 ps 394.87 ps] Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe single_mat3_mul_v time: [650.16 ps 651.47 ps 653.07 ps] Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe single_mat4_mul_v time: [1.0665 ns 1.0690 ns 1.0722 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat2_tr_mul_v time: [719.95 ps 720.92 ps 722.16 ps] change: [+127.86% +128.34% +128.98%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 1 (1.00%) low severe 2 (2.00%) low mild 7 (7.00%) high mild 4 (4.00%) high severe mat3_tr_mul_v time: [1.6551 ns 1.6564 ns 1.6577 ns] change: [+277.57% +278.32% +279.16%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat4_tr_mul_v time: [2.6477 ns 2.6546 ns 2.6666 ns] change: [+259.47% +260.55% +261.67%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) low severe 3 (3.00%) high mild 3 (3.00%) high severe single_mat2_tr_mul_v time: [353.60 ps 355.50 ps 358.48 ps] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 4 (4.00%) high mild 3 (3.00%) high severe single_mat3_tr_mul_v time: [778.13 ps 779.43 ps 781.25 ps] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 3 (3.00%) high mild 5 (5.00%) high severe single_mat4_tr_mul_v time: [1.1887 ns 1.1906 ns 1.1930 ns] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe mat2_mul_s time: [774.44 ps 775.33 ps 776.37 ps] change: [+6.0947% +6.3308% +6.5936%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat3_mul_s time: [962.59 ps 964.98 ps 967.43 ps] change: [−38.097% −37.694% −37.145%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe mat4_mul_s time: [1.6589 ns 1.6640 ns 1.6684 ns] change: [−43.668% −43.130% −42.518%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 8 (8.00%) low severe 3 (3.00%) low mild 1 (1.00%) high mild 6 (6.00%) high severe mat2_div_s time: [803.09 ps 804.70 ps 806.56 ps] change: [+10.272% +10.596% +10.960%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe mat3_div_s time: [2.4929 ns 2.4947 ns 2.4967 ns] change: [+58.793% +59.185% +59.709%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low severe 5 (5.00%) high mild 4 (4.00%) high severe mat4_div_s time: [5.1650 ns 5.1688 ns 5.1735 ns] change: [+76.816% +77.215% +77.629%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe mat2_inv time: [1.1514 ns 1.1523 ns 1.1533 ns] change: [−41.682% −41.556% −41.439%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat3_inv time: [3.3641 ns 3.3707 ns 3.3826 ns] change: [−37.473% −37.358% −37.214%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe mat4_inv time: [25.970 ns 26.006 ns 26.062 ns] change: [−9.0865% −8.9013% −8.6986%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 6 (6.00%) high severe mat2_transpose time: [409.94 ps 410.77 ps 411.75 ps] change: [−62.889% −62.624% −62.331%] (p = 0.00 < 0.05) Performance has improved. Found 17 outliers among 100 measurements (17.00%) 4 (4.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe mat3_transpose time: [947.42 ps 953.20 ps 961.97 ps] change: [−61.273% −60.195% −58.616%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 7 (7.00%) high mild 3 (3.00%) high severe mat4_transpose time: [1.6510 ns 1.6551 ns 1.6612 ns] change: [−65.877% −65.592% −65.225%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe mat_div_scalar time: [480.25 µs 480.55 µs 480.99 µs] change: [−22.235% −22.169% −22.095%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe mat100_add_mat100 time: [3.0426 µs 3.0910 µs 3.1351 µs] change: [+81.145% +84.392% +88.112%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 3 (3.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe mat4_mul_mat4 time: [36.836 ns 36.859 ns 36.886 ns] change: [+24.966% +25.568% +26.171%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) low severe 4 (4.00%) high mild 2 (2.00%) high severe mat5_mul_mat5 time: [56.715 ns 56.876 ns 57.015 ns] change: [+10.239% +10.666% +11.091%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild mat6_mul_mat6 time: [83.817 ns 83.999 ns 84.156 ns] change: [+10.675% +10.890% +11.065%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild mat7_mul_mat7 time: [93.211 ns 93.386 ns 93.534 ns] change: [+10.654% +10.892% +11.129%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 2 (2.00%) low mild mat8_mul_mat8 time: [88.919 ns 89.410 ns 89.884 ns] change: [+22.808% +23.376% +23.888%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild mat9_mul_mat9 time: [207.12 ns 209.04 ns 211.17 ns] change: [+14.053% +14.646% +15.258%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 9 (9.00%) low mild 1 (1.00%) high mild mat10_mul_mat10 time: [236.75 ns 237.11 ns 237.47 ns] change: [+20.055% +20.366% +20.651%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) low severe 7 (7.00%) low mild 1 (1.00%) high mild mat10_mul_mat10_static time: [116.68 ns 117.15 ns 117.62 ns] change: [+11.160% +11.617% +12.049%] (p = 0.00 < 0.05) Performance has regressed. mat100_mul_mat100 time: [40.188 µs 40.327 µs 40.459 µs] change: [+3.2490% +3.4765% +3.7130%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe mat500_mul_mat500 time: [4.3909 ms 4.3944 ms 4.3978 ms] change: [+0.8556% +0.9519% +1.0448%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) low severe 2 (2.00%) high mild 1 (1.00%) high severe iter time: [840.01 µs 840.39 µs 840.81 µs] change: [+10.527% +10.726% +10.915%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) high mild 11 (11.00%) high severe iter_rev time: [210.14 µs 211.10 µs 212.84 µs] change: [+0.2455% +0.7119% +1.7846%] (p = 0.02 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe copy_from time: [199.77 µs 200.80 µs 202.55 µs] change: [+41.195% +41.962% +43.287%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 8 (8.00%) low mild 1 (1.00%) high severe axpy time: [31.301 µs 33.301 µs 34.957 µs] change: [+40.726% +52.001% +63.112%] (p = 0.00 < 0.05) Performance has regressed. tr_mul_to time: [126.46 µs 127.12 µs 128.09 µs] change: [−4.0124% −3.5145% −2.7708%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe mat_mul_mat time: [39.252 µs 39.443 µs 39.626 µs] change: [−0.7084% −0.3800% −0.0130%] (p = 0.02 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 8 (8.00%) high mild 2 (2.00%) high severe mat100_from_fn time: [6.8398 µs 6.8418 µs 6.8446 µs] change: [+519.35% +522.43% +524.76%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) high mild 9 (9.00%) high severe mat500_from_fn time: [172.11 µs 172.14 µs 172.18 µs] change: [+498.70% +499.32% +499.93%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low mild 5 (5.00%) high mild 7 (7.00%) high severe vec2_add_v_f32 time: [303.98 ps 304.76 ps 305.65 ps] change: [−5.1499% −4.3536% −3.5996%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 5 (5.00%) high mild 6 (6.00%) high severe vec3_add_v_f32 time: [586.36 ps 587.93 ps 589.92 ps] change: [+34.275% +34.886% +35.631%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 5 (5.00%) high mild 6 (6.00%) high severe vec4_add_v_f32 time: [603.45 ps 604.44 ps 605.59 ps] change: [−18.949% −18.215% −17.623%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe vec2_add_v_f64 time: [602.08 ps 602.83 ps 603.64 ps] change: [+89.139% +90.573% +91.808%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe vec3_add_v_f64 time: [910.94 ps 912.60 ps 914.56 ps] change: [+107.10% +108.18% +109.41%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low severe 6 (6.00%) high mild 3 (3.00%) high severe vec4_add_v_f64 time: [1.1894 ns 1.1933 ns 1.1963 ns] change: [+82.607% +85.023% +86.911%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 9 (9.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe vec2_sub_v time: [303.45 ps 304.42 ps 305.37 ps] change: [−5.3598% −4.4578% −3.6738%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 8 (8.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe vec3_sub_v time: [672.95 ps 674.82 ps 676.51 ps] change: [+51.463% +52.336% +53.346%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe vec4_sub_v time: [602.84 ps 604.65 ps 607.70 ps] change: [−19.744% −18.754% −17.881%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe vec2_mul_s time: [666.49 ps 667.29 ps 668.31 ps] change: [+111.37% +111.81% +112.32%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 4 (4.00%) low severe 6 (6.00%) high mild 6 (6.00%) high severe vec3_mul_s time: [511.42 ps 513.44 ps 515.86 ps] change: [+15.556% +16.273% +17.049%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe vec4_mul_s time: [774.13 ps 775.22 ps 776.52 ps] change: [+5.1602% +5.5545% +6.0225%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 7 (7.00%) high severe vec2_div_s time: [1.3658 ns 1.3694 ns 1.3726 ns] change: [+328.67% +329.83% +331.09%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe vec3_div_s time: [607.73 ps 608.63 ps 609.66 ps] change: [+37.642% +38.017% +38.440%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 8 (8.00%) high mild 6 (6.00%) high severe vec4_div_s time: [802.59 ps 803.62 ps 804.82 ps] change: [+8.9451% +9.3240% +9.7149%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low severe 6 (6.00%) high mild 2 (2.00%) high severe vec2_dot_f32 time: [461.20 ps 461.73 ps 462.30 ps] change: [+117.88% +119.27% +120.79%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 9 (9.00%) high severe vec3_dot_f32 time: [688.24 ps 689.05 ps 689.95 ps] change: [+225.49% +227.19% +229.16%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe vec4_dot_f32 time: [917.20 ps 921.23 ps 928.57 ps] change: [+338.59% +341.30% +344.17%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 8 (8.00%) high mild 5 (5.00%) high severe vec2_dot_f64 time: [596.11 ps 597.51 ps 598.79 ps] change: [+177.79% +179.60% +182.13%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe vec3_dot_f64 time: [749.32 ps 751.02 ps 752.81 ps] change: [+253.48% +257.12% +262.11%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe vec4_dot_f64 time: [1.0145 ns 1.0185 ns 1.0230 ns] change: [+376.34% +379.47% +383.46%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe vec3_cross time: [971.01 ps 971.87 ps 972.73 ps] change: [+122.34% +122.74% +123.17%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe vec2_norm time: [1.0612 ns 1.0623 ns 1.0637 ns] change: [−0.0722% +0.0499% +0.1765%] (p = 0.44 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) low mild 2 (2.00%) high severe vec3_norm time: [1.0649 ns 1.0665 ns 1.0694 ns] change: [−4.3787% −4.1856% −3.8679%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe vec4_norm time: [1.0733 ns 1.0739 ns 1.0746 ns] change: [−4.5616% −3.9738% −2.9157%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 2 (2.00%) low severe 7 (7.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe vec2_normalize time: [2.5310 ns 2.5326 ns 2.5345 ns] change: [+3.5769% +3.6696% +3.7678%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe vec3_normalize time: [2.5389 ns 2.5409 ns 2.5424 ns] change: [+1.1411% +1.2860% +1.4910%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe vec4_normalize time: [1.8154 ns 1.8164 ns 1.8173 ns] change: [−1.1191% −0.9926% −0.8485%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe vec10000_dot_f64 time: [2.0296 µs 2.0337 µs 2.0383 µs] change: [+71.107% +72.619% +74.228%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe vec10000_dot_f32 time: [1.1891 µs 1.1926 µs 1.1962 µs] change: [+6.3585% +7.1059% +7.9357%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 6 (6.00%) high severe vec10000_axpy_f64 time: [2.0702 µs 2.0739 µs 2.0777 µs] change: [+39.373% +40.227% +41.210%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe vec10000_axpy_beta_f64 time: [2.0914 µs 2.0962 µs 2.1012 µs] change: [+31.958% +32.843% +33.467%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 5 (5.00%) high mild 2 (2.00%) high severe vec10000_axpy_f64_slice time: [2.0272 µs 2.0303 µs 2.0335 µs] change: [+35.880% +36.621% +37.307%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) low severe 2 (2.00%) high mild 1 (1.00%) high severe vec10000_axpy_f64_static time: [13.917 µs 13.965 µs 14.005 µs] change: [+859.61% +869.73% +879.35%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe vec10000_axpy_f32 time: [1.0402 µs 1.0421 µs 1.0437 µs] change: [+38.710% +39.603% +40.363%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe vec10000_axpy_beta_f32 time: [1.0329 µs 1.0346 µs 1.0364 µs] change: [+30.705% +31.490% +32.040%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe quaternion_add_q time: [642.58 ps 650.39 ps 662.45 ps] change: [−11.788% −10.934% −9.9463%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 6 (6.00%) high severe quaternion_sub_q time: [641.16 ps 643.22 ps 645.88 ps] change: [−12.654% −11.822% −10.943%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 5 (5.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 4 (4.00%) high severe quaternion_mul_q time: [1.4252 ns 1.4271 ns 1.4294 ns] change: [+94.545% +95.022% +95.499%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe unit_quaternion_mul_v time: [1.4859 ns 1.4874 ns 1.4890 ns] change: [+242.77% +243.56% +244.31%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild single_unit_quaternion_mul_v time: [1.0422 ns 1.0457 ns 1.0504 ns] Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 4 (4.00%) high mild 4 (4.00%) high severe quaternion_mul_s time: [771.17 ps 772.18 ps 773.37 ps] change: [+6.1278% +6.4276% +6.7583%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe quaternion_div_s time: [798.54 ps 799.82 ps 801.43 ps] change: [+9.2123% +9.7287% +10.338%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe quaternion_inv time: [1.2401 ns 1.2408 ns 1.2417 ns] change: [−43.660% −43.521% −43.317%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 5 (5.00%) high mild 6 (6.00%) high severe unit_quaternion_inv time: [596.01 ps 598.93 ps 602.66 ps] change: [−49.707% −49.184% −48.445%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 6 (6.00%) high mild 9 (9.00%) high severe quaternion_conjugate time: [604.36 ps 608.60 ps 613.48 ps] Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) high mild 9 (9.00%) high severe quaternion_normalize time: [1.8268 ns 1.8274 ns 1.8281 ns] Found 18 outliers among 100 measurements (18.00%) 4 (4.00%) low severe 4 (4.00%) low mild 7 (7.00%) high mild 3 (3.00%) high severe bidiagonalize_100x100 time: [265.91 µs 266.00 µs 266.11 µs] change: [+0.7553% +0.8363% +0.9114%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe bidiagonalize_100x500 time: [2.0053 ms 2.0060 ms 2.0065 ms] change: [+4.0325% +4.2372% +4.3938%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) low severe 2 (2.00%) high mild 5 (5.00%) high severe bidiagonalize_4x4 time: [266.92 ns 267.24 ns 267.62 ns] change: [+7.1063% +7.2057% +7.3231%] (p = 0.00 < 0.05) Performance has regressed. Found 23 outliers among 100 measurements (23.00%) 1 (1.00%) low severe 5 (5.00%) low mild 13 (13.00%) high mild 4 (4.00%) high severe Benchmarking bidiagonalize_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50. bidiagonalize_500x100 time: [1.6781 ms 1.6793 ms 1.6804 ms] change: [+1.3944% +1.5312% +1.6400%] (p = 0.00 < 0.05) Performance has regressed. bidiagonalize_unpack_100x100 time: [522.13 µs 522.36 µs 522.63 µs] change: [−0.5318% −0.4044% −0.2627%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe bidiagonalize_unpack_100x500 time: [2.9858 ms 2.9916 ms 2.9976 ms] change: [−0.7824% −0.3995% −0.0370%] (p = 0.04 < 0.05) Change within noise threshold. bidiagonalize_unpack_500x100 time: [2.5884 ms 2.5896 ms 2.5910 ms] change: [+0.0767% +0.1539% +0.2316%] (p = 0.00 < 0.05) Change within noise threshold. cholesky_100x100 time: [31.084 µs 31.101 µs 31.122 µs] change: [−5.0365% −4.7949% −4.4205%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 4 (4.00%) low mild 1 (1.00%) high mild 9 (9.00%) high severe cholesky_500x500 time: [4.4799 ms 4.4849 ms 4.4903 ms] change: [−0.5985% −0.3685% −0.1374%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe cholesky_decompose_unpack_100x100 time: [31.659 µs 31.685 µs 31.727 µs] change: [−4.9712% −4.7445% −4.3325%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe cholesky_decompose_unpack_500x500 time: [4.4795 ms 4.4845 ms 4.4910 ms] change: [−1.9595% −1.7121% −1.4978%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 7 (7.00%) high severe cholesky_solve_10x10 time: [170.70 ns 170.76 ns 170.82 ns] change: [+8.0936% +8.1777% +8.2764%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe cholesky_solve_100x100 time: [2.9071 µs 2.9117 µs 2.9174 µs] change: [+8.4770% +8.9956% +9.6254%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe cholesky_solve_500x500 time: [54.193 µs 54.303 µs 54.417 µs] change: [+3.9332% +4.1755% +4.4477%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild cholesky_inverse_10x10 time: [1.3189 µs 1.3195 µs 1.3201 µs] change: [+2.5360% +2.6238% +2.7131%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe cholesky_inverse_100x100 time: [270.85 µs 270.88 µs 270.92 µs] change: [−0.9726% −0.8524% −0.7319%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe cholesky_inverse_500x500 time: [26.673 ms 26.694 ms 26.714 ms] change: [+1.0784% +1.1816% +1.2794%] (p = 0.00 < 0.05) Performance has regressed. Found 23 outliers among 100 measurements (23.00%) 19 (19.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe full_piv_lu_decompose_10x10 time: [582.31 ns 582.48 ns 582.67 ns] change: [+19.583% +19.702% +19.795%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 6 (6.00%) high mild 2 (2.00%) high severe full_piv_lu_decompose_100x100 time: [218.73 µs 218.78 µs 218.84 µs] change: [+5.8729% +5.9828% +6.0904%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low severe 5 (5.00%) low mild 1 (1.00%) high severe full_piv_lu_solve_10x10 time: [124.88 ns 124.94 ns 125.02 ns] change: [+7.4724% +7.6252% +7.7787%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low severe 6 (6.00%) high mild 4 (4.00%) high severe full_piv_lu_solve_100x100 time: [2.5202 µs 2.5244 µs 2.5289 µs] change: [+11.226% +11.847% +12.518%] (p = 0.00 < 0.05) Performance has regressed. Found 17 outliers among 100 measurements (17.00%) 14 (14.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild full_piv_lu_inverse_10x10 time: [869.61 ns 870.27 ns 871.19 ns] change: [+4.7996% +4.9224% +5.0608%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low severe 1 (1.00%) high mild 4 (4.00%) high severe full_piv_lu_inverse_100x100 time: [212.68 µs 212.83 µs 213.05 µs] change: [−0.2835% −0.0351% +0.1310%] (p = 0.80 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe full_piv_lu_determinant_10x10 time: [15.320 ns 15.338 ns 15.357 ns] change: [+410.70% +421.41% +430.41%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 9 (9.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild full_piv_lu_determinant_100x100 time: [137.44 ns 139.37 ns 141.00 ns] change: [+213.54% +227.75% +241.42%] (p = 0.00 < 0.05) Performance has regressed. hessenberg_decompose_4x4 time: [82.510 ns 82.538 ns 82.564 ns] change: [−27.950% −27.887% −27.830%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild hessenberg_decompose_100x100 time: [295.98 µs 296.16 µs 296.44 µs] change: [+3.3234% +3.5705% +3.7986%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe hessenberg_decompose_200x200 time: [2.2647 ms 2.2681 ms 2.2714 ms] change: [+4.8426% +4.9983% +5.1646%] (p = 0.00 < 0.05) Performance has regressed. hessenberg_decompose_unpack_100x100 time: [435.30 µs 435.75 µs 436.12 µs] change: [+2.7479% +2.8420% +2.9424%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe hessenberg_decompose_unpack_200x200 time: [3.2667 ms 3.2678 ms 3.2690 ms] change: [+3.9624% +4.0021% +4.0423%] (p = 0.00 < 0.05) Performance has regressed. Found 22 outliers among 100 measurements (22.00%) 13 (13.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe lu_decompose_10x10 time: [353.04 ns 353.16 ns 353.31 ns] change: [−5.0408% −4.9435% −4.8487%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 4 (4.00%) low severe 4 (4.00%) low mild 6 (6.00%) high mild 5 (5.00%) high severe lu_decompose_100x100 time: [71.544 µs 71.560 µs 71.579 µs] change: [−1.7176% −1.6430% −1.5721%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe lu_solve_10x10 time: [115.42 ns 115.52 ns 115.61 ns] change: [+3.9363% +4.1024% +4.2557%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 8 (8.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe lu_solve_100x100 time: [2.5152 µs 2.5190 µs 2.5225 µs] change: [+15.120% +15.625% +16.088%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild lu_inverse_10x10 time: [902.55 ns 903.32 ns 903.97 ns] change: [+0.7407% +0.8734% +1.0263%] (p = 0.00 < 0.05) Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high severe lu_inverse_100x100 time: [216.21 µs 216.47 µs 216.80 µs] change: [−0.6663% −0.5584% −0.4316%] (p = 0.00 < 0.05) Change within noise threshold. Found 18 outliers among 100 measurements (18.00%) 2 (2.00%) low severe 4 (4.00%) low mild 5 (5.00%) high mild 7 (7.00%) high severe lu_determinant_10x10 time: [13.394 ns 13.481 ns 13.665 ns] change: [+508.98% +524.96% +543.53%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe lu_determinant_100x100 time: [149.12 ns 150.16 ns 151.08 ns] change: [+265.69% +281.86% +296.23%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 10 (10.00%) low severe 4 (4.00%) low mild qr_decompose_100x100 time: [141.62 µs 141.65 µs 141.69 µs] change: [+0.6391% +0.8447% +0.9784%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe Benchmarking qr_decompose_100x500: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60. qr_decompose_100x500 time: [1.0071 ms 1.0082 ms 1.0097 ms] change: [+0.9031% +1.2358% +1.6126%] (p = 0.00 < 0.05) Change within noise threshold. Found 16 outliers among 100 measurements (16.00%) 12 (12.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe qr_decompose_4x4 time: [100.40 ns 100.43 ns 100.45 ns] change: [−19.315% −19.268% −19.224%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 1 (1.00%) high mild 4 (4.00%) high severe Benchmarking qr_decompose_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. qr_decompose_500x100 time: [847.17 µs 847.68 µs 848.21 µs] change: [+2.1441% +2.3425% +2.5069%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe qr_decompose_unpack_100x100 time: [283.22 µs 283.26 µs 283.30 µs] change: [−0.3591% −0.2383% −0.1147%] (p = 0.00 < 0.05) Change within noise threshold. Found 23 outliers among 100 measurements (23.00%) 21 (21.00%) low severe 1 (1.00%) low mild 1 (1.00%) high severe Benchmarking qr_decompose_unpack_100x500: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60. qr_decompose_unpack_100x500 time: [1.1399 ms 1.1429 ms 1.1457 ms] change: [−1.9555% −1.8085% −1.6312%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking qr_decompose_unpack_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.6s, enable flat sampling, or reduce sample count to 50. qr_decompose_unpack_500x100 time: [1.6633 ms 1.6640 ms 1.6648 ms] change: [+1.4516% +1.5245% +1.5969%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low severe 5 (5.00%) low mild 4 (4.00%) high severe qr_solve_10x10 time: [156.51 ns 156.56 ns 156.61 ns] change: [+3.7415% +3.8709% +3.9947%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) low severe 5 (5.00%) low mild 1 (1.00%) high mild qr_solve_100x100 time: [3.5393 µs 3.5454 µs 3.5511 µs] change: [+6.0908% +6.5747% +6.9798%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) low mild qr_inverse_10x10 time: [806.75 ns 807.99 ns 809.61 ns] change: [+0.6973% +0.8242% +0.9558%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe qr_inverse_100x100 time: [330.65 µs 330.74 µs 330.85 µs] change: [+1.2238% +1.3244% +1.4518%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe schur_decompose_4x4 time: [969.14 ns 969.71 ns 970.18 ns] change: [−12.293% −12.223% −12.149%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe schur_decompose_10x10 time: [7.3226 µs 7.3237 µs 7.3247 µs] change: [+0.3785% +0.4095% +0.4394%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low mild 4 (4.00%) high mild 3 (3.00%) high severe schur_decompose_100x100 time: [2.5760 ms 2.5763 ms 2.5768 ms] change: [+0.7992% +0.8504% +0.8935%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe schur_decompose_200x200 time: [18.285 ms 18.296 ms 18.308 ms] change: [+1.9360% +2.0941% +2.2427%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe eigenvalues_4x4 time: [937.94 ns 938.15 ns 938.38 ns] change: [+25.764% +25.898% +26.023%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild eigenvalues_10x10 time: [5.9066 µs 5.9088 µs 5.9117 µs] change: [+0.1208% +0.1938% +0.2740%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe Benchmarking eigenvalues_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. eigenvalues_100x100 time: [1.5870 ms 1.5873 ms 1.5876 ms] change: [−0.8569% −0.8247% −0.7914%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe eigenvalues_200x200 time: [11.081 ms 11.088 ms 11.102 ms] change: [+0.0054% +0.2956% +0.4946%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe solve_l_triangular_100x100 time: [1.3250 µs 1.3651 µs 1.4012 µs] change: [+22.932% +24.999% +27.087%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 10 (10.00%) high mild 2 (2.00%) high severe solve_l_triangular_1000x1000 time: [101.52 µs 102.04 µs 102.85 µs] change: [+1.5784% +2.0953% +2.8471%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 9 (9.00%) high mild 6 (6.00%) high severe tr_solve_l_triangular_100x100 time: [2.0144 µs 2.0537 µs 2.0902 µs] change: [+13.600% +14.669% +15.998%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe tr_solve_l_triangular_1000x1000 time: [93.569 µs 94.056 µs 94.857 µs] change: [+1.2474% +1.7955% +2.5979%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe solve_u_triangular_100x100 time: [1.5878 µs 1.6615 µs 1.7405 µs] change: [+31.200% +34.370% +38.132%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 10 (10.00%) high mild 3 (3.00%) high severe solve_u_triangular_1000x1000 time: [105.07 µs 105.46 µs 106.12 µs] change: [+6.6559% +7.0936% +7.8401%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe tr_solve_u_triangular_100x100 time: [1.4369 µs 1.4697 µs 1.4986 µs] change: [+17.195% +18.687% +20.307%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 11 (11.00%) high mild 2 (2.00%) high severe tr_solve_u_triangular_1000x1000 time: [88.868 µs 89.303 µs 90.014 µs] change: [+4.2489% +4.7933% +5.6045%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe svd_decompose_2x2 time: [22.913 ns 22.958 ns 23.017 ns] change: [+9.3648% +9.7443% +10.253%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe svd_decompose_3x3 time: [359.30 ns 359.72 ns 360.20 ns] change: [+9.0123% +9.1174% +9.2394%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild svd_decompose_4x4 time: [896.28 ns 896.55 ns 896.85 ns] change: [−7.1192% −7.0496% −6.9853%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe svd_decompose_10x10 time: [5.7680 µs 5.7708 µs 5.7739 µs] change: [+1.1933% +1.4155% +1.6347%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe Benchmarking svd_decompose_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. svd_decompose_100x100 time: [1.5704 ms 1.5709 ms 1.5715 ms] change: [+1.4465% +1.4891% +1.5357%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe svd_decompose_200x200 time: [11.845 ms 11.847 ms 11.850 ms] change: [+1.4378% +1.4794% +1.5225%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high severe rank_4x4 time: [716.49 ns 716.62 ns 716.74 ns] change: [+4.9084% +4.9678% +5.0237%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild rank_10x10 time: [4.2304 µs 4.2341 µs 4.2377 µs] change: [+0.4993% +0.6056% +0.7271%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild rank_100x100 time: [522.74 µs 522.85 µs 522.97 µs] change: [+0.2822% +0.3170% +0.3535%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high severe rank_200x200 time: [3.0167 ms 3.0217 ms 3.0267 ms] change: [+0.3924% +0.5333% +0.6946%] (p = 0.00 < 0.05) Change within noise threshold. singular_values_4x4 time: [735.97 ns 736.08 ns 736.21 ns] change: [−7.6736% −7.6163% −7.5596%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe singular_values_10x10 time: [4.2987 µs 4.2997 µs 4.3010 µs] change: [+1.6193% +1.7215% +1.8186%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe singular_values_100x100 time: [525.20 µs 525.36 µs 525.54 µs] change: [+0.4054% +0.4526% +0.4982%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe singular_values_200x200 time: [3.0712 ms 3.0729 ms 3.0750 ms] change: [+2.1769% +2.2358% +2.3112%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe pseudo_inverse_4x4 time: [877.64 ns 878.38 ns 879.12 ns] change: [−8.2828% −8.2216% −8.1662%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 7 (7.00%) high severe pseudo_inverse_10x10 time: [6.0008 µs 6.0034 µs 6.0064 µs] change: [+0.2665% +0.3678% +0.4766%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe Benchmarking pseudo_inverse_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50. pseudo_inverse_100x100 time: [1.6088 ms 1.6091 ms 1.6094 ms] change: [+0.1161% +0.2007% +0.2937%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe pseudo_inverse_200x200 time: [12.038 ms 12.042 ms 12.047 ms] change: [−0.4351% −0.2531% −0.0699%] (p = 0.01 < 0.05) Change within noise threshold. Found 22 outliers among 100 measurements (22.00%) 16 (16.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe symmetric_eigen_decompose_4x4 time: [518.00 ns 518.07 ns 518.15 ns] change: [+4.7008% +4.7492% +4.8006%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe symmetric_eigen_decompose_10x10 time: [3.6417 µs 3.6428 µs 3.6440 µs] change: [−0.1549% −0.0998% −0.0483%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe symmetric_eigen_decompose_100x100 time: [761.64 µs 762.66 µs 763.80 µs] change: [−5.8109% −5.7178% −5.6284%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 9 (9.00%) low severe 9 (9.00%) low mild 1 (1.00%) high severe symmetric_eigen_decompose_200x200 time: [5.1304 ms 5.1337 ms 5.1372 ms] change: [−9.4434% −9.3646% −9.2959%] (p = 0.00 < 0.05) Performance has improved.

During benchmarking I found that `codegen-units` with default value leads to inconsistent results across recompilations (clean vs. incremental). Also, sometimes it leads to a significant performance degradation of benchmarks unrelated to code changes. Also see rust-lang/rust#146497

…nstruction!()

Criterion generates a `Vec` of arguments and passes them through the `black_box()` to guarantee that the benchmark closure is never optimized out of the benchmarking loop. This fixes dimforge#1547 for benchmarks that use `bench_*!()` macros.

This simulates real-world use cases like multiplication of many vectors by a single matrix. There is a ~2x performance difference between a case when both arguments are random on each iteration and a case when one argument is static and second is random on each iteration: mat2_mul_v time: [778.33 ps 785.41 ps 797.70 ps] Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) low severe 4 (4.00%) high mild 5 (5.00%) high severe mat3_mul_v time: [1.7001 ns 1.7051 ns 1.7111 ns] Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low severe 1 (1.00%) low mild 8 (8.00%) high mild 1 (1.00%) high severe mat4_mul_v time: [2.6101 ns 2.6223 ns 2.6374 ns] Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe single_mat2_mul_v time: [402.65 ps 403.62 ps 404.75 ps] Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low mild 5 (5.00%) high mild 3 (3.00%) high severe single_mat3_mul_v time: [651.30 ps 654.06 ps 657.15 ps] Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) low mild 8 (8.00%) high mild 4 (4.00%) high severe single_mat4_mul_v time: [1.0628 ns 1.0645 ns 1.0666 ns] Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat2_tr_mul_v time: [719.81 ps 721.99 ps 724.59 ps] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 5 (5.00%) high mild mat3_tr_mul_v time: [1.6685 ns 1.6758 ns 1.6841 ns] Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat4_tr_mul_v time: [2.6739 ns 2.6897 ns 2.7080 ns] Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 8 (8.00%) high severe single_mat2_tr_mul_v time: [353.36 ps 354.56 ps 356.03 ps] Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe single_mat3_tr_mul_v time: [779.82 ps 782.84 ps 786.37 ps] Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild 2 (2.00%) high severe single_mat4_tr_mul_v time: [1.1918 ns 1.1946 ns 1.1977 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe unit_quaternion_mul_v time: [1.5002 ns 1.5088 ns 1.5183 ns] change: [−0.0578% +0.3775% +0.8498%] (p = 0.10 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe single_unit_quaternion_mul_v time: [1.0489 ns 1.0531 ns 1.0584 ns] Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe

I do not know why those benchmarks were commented out.

…hmarks The bulk of the changes was done Claude Sonnet 4. Additionally I moved `DVector` allocations outside of the benchmark, and added anything allocated and not consumed into a return tuple of a benchmark closure to ensure that implicit drop/free is not included into the measured time. This fixes https://github.com/dimforge/nalgebra/issues/1547 for the remaining benchmarks. Benchmark results before vs. after all changes: mat2_mul_m time: [1.1043 ns 1.1058 ns 1.1077 ns] change: [+49.306% +49.651% +50.045%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) low severe 2 (2.00%) high mild 6 (6.00%) high severe mat3_mul_m time: [3.1885 ns 3.1945 ns 3.2038 ns] change: [+102.62% +103.63% +104.86%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe mat4_mul_m time: [6.7759 ns 6.7840 ns 6.7929 ns] change: [+130.65% +131.50% +132.59%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe mat2_tr_mul_m time: [1.2882 ns 1.2901 ns 1.2926 ns] change: [+75.005% +75.472% +75.928%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat3_tr_mul_m time: [3.1688 ns 3.1725 ns 3.1770 ns] change: [+101.61% +102.10% +102.66%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 4 (4.00%) high mild 4 (4.00%) high severe mat4_tr_mul_m time: [6.5406 ns 6.5453 ns 6.5508 ns] change: [+121.95% +122.66% +123.42%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 6 (6.00%) high severe mat2_add_m time: [644.68 ps 645.88 ps 647.24 ps] change: [−13.049% −12.530% −11.972%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe mat3_add_m time: [1.3543 ns 1.3572 ns 1.3607 ns] change: [−14.707% −13.705% −12.403%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 6 (6.00%) low severe 5 (5.00%) high mild 4 (4.00%) high severe mat4_add_m time: [2.3987 ns 2.4015 ns 2.4044 ns] change: [−20.676% −19.615% −18.453%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) low severe 5 (5.00%) high mild 3 (3.00%) high severe mat2_sub_m time: [637.47 ps 638.88 ps 640.62 ps] change: [−13.604% −13.020% −12.333%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe mat3_sub_m time: [1.3531 ns 1.3546 ns 1.3562 ns] change: [−15.139% −14.610% −14.084%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild 4 (4.00%) high severe mat4_sub_m time: [2.3972 ns 2.3996 ns 2.4021 ns] change: [−20.412% −19.249% −18.330%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat2_mul_v time: [774.43 ps 775.48 ps 776.73 ps] change: [+144.90% +145.51% +146.12%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 5 (5.00%) high mild 3 (3.00%) high severe mat3_mul_v time: [1.6843 ns 1.6858 ns 1.6874 ns] change: [+284.57% +285.82% +287.43%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) low severe 1 (1.00%) high mild 3 (3.00%) high severe mat4_mul_v time: [2.6029 ns 2.6196 ns 2.6485 ns] change: [+255.34% +257.62% +261.68%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe single_mat2_mul_v time: [392.29 ps 393.45 ps 394.87 ps] Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe single_mat3_mul_v time: [650.16 ps 651.47 ps 653.07 ps] Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe single_mat4_mul_v time: [1.0665 ns 1.0690 ns 1.0722 ns] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat2_tr_mul_v time: [719.95 ps 720.92 ps 722.16 ps] change: [+127.86% +128.34% +128.98%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 1 (1.00%) low severe 2 (2.00%) low mild 7 (7.00%) high mild 4 (4.00%) high severe mat3_tr_mul_v time: [1.6551 ns 1.6564 ns 1.6577 ns] change: [+277.57% +278.32% +279.16%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat4_tr_mul_v time: [2.6477 ns 2.6546 ns 2.6666 ns] change: [+259.47% +260.55% +261.67%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) low severe 3 (3.00%) high mild 3 (3.00%) high severe single_mat2_tr_mul_v time: [353.60 ps 355.50 ps 358.48 ps] Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 4 (4.00%) high mild 3 (3.00%) high severe single_mat3_tr_mul_v time: [778.13 ps 779.43 ps 781.25 ps] Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 3 (3.00%) high mild 5 (5.00%) high severe single_mat4_tr_mul_v time: [1.1887 ns 1.1906 ns 1.1930 ns] Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe mat2_mul_s time: [774.44 ps 775.33 ps 776.37 ps] change: [+6.0947% +6.3308% +6.5936%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 4 (4.00%) high severe mat3_mul_s time: [962.59 ps 964.98 ps 967.43 ps] change: [−38.097% −37.694% −37.145%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe mat4_mul_s time: [1.6589 ns 1.6640 ns 1.6684 ns] change: [−43.668% −43.130% −42.518%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 8 (8.00%) low severe 3 (3.00%) low mild 1 (1.00%) high mild 6 (6.00%) high severe mat2_div_s time: [803.09 ps 804.70 ps 806.56 ps] change: [+10.272% +10.596% +10.960%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe mat3_div_s time: [2.4929 ns 2.4947 ns 2.4967 ns] change: [+58.793% +59.185% +59.709%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low severe 5 (5.00%) high mild 4 (4.00%) high severe mat4_div_s time: [5.1650 ns 5.1688 ns 5.1735 ns] change: [+76.816% +77.215% +77.629%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe mat2_inv time: [1.1514 ns 1.1523 ns 1.1533 ns] change: [−41.682% −41.556% −41.439%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe mat3_inv time: [3.3641 ns 3.3707 ns 3.3826 ns] change: [−37.473% −37.358% −37.214%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe mat4_inv time: [25.970 ns 26.006 ns 26.062 ns] change: [−9.0865% −8.9013% −8.6986%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 6 (6.00%) high severe mat2_transpose time: [409.94 ps 410.77 ps 411.75 ps] change: [−62.889% −62.624% −62.331%] (p = 0.00 < 0.05) Performance has improved. Found 17 outliers among 100 measurements (17.00%) 4 (4.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe mat3_transpose time: [947.42 ps 953.20 ps 961.97 ps] change: [−61.273% −60.195% −58.616%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 7 (7.00%) high mild 3 (3.00%) high severe mat4_transpose time: [1.6510 ns 1.6551 ns 1.6612 ns] change: [−65.877% −65.592% −65.225%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe mat_div_scalar time: [480.25 µs 480.55 µs 480.99 µs] change: [−22.235% −22.169% −22.095%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe mat100_add_mat100 time: [3.0426 µs 3.0910 µs 3.1351 µs] change: [+81.145% +84.392% +88.112%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 3 (3.00%) low mild 7 (7.00%) high mild 1 (1.00%) high severe mat4_mul_mat4 time: [36.836 ns 36.859 ns 36.886 ns] change: [+24.966% +25.568% +26.171%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) low severe 4 (4.00%) high mild 2 (2.00%) high severe mat5_mul_mat5 time: [56.715 ns 56.876 ns 57.015 ns] change: [+10.239% +10.666% +11.091%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild mat6_mul_mat6 time: [83.817 ns 83.999 ns 84.156 ns] change: [+10.675% +10.890% +11.065%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild mat7_mul_mat7 time: [93.211 ns 93.386 ns 93.534 ns] change: [+10.654% +10.892% +11.129%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 2 (2.00%) low mild mat8_mul_mat8 time: [88.919 ns 89.410 ns 89.884 ns] change: [+22.808% +23.376% +23.888%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild mat9_mul_mat9 time: [207.12 ns 209.04 ns 211.17 ns] change: [+14.053% +14.646% +15.258%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 9 (9.00%) low mild 1 (1.00%) high mild mat10_mul_mat10 time: [236.75 ns 237.11 ns 237.47 ns] change: [+20.055% +20.366% +20.651%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) low severe 7 (7.00%) low mild 1 (1.00%) high mild mat10_mul_mat10_static time: [116.68 ns 117.15 ns 117.62 ns] change: [+11.160% +11.617% +12.049%] (p = 0.00 < 0.05) Performance has regressed. mat100_mul_mat100 time: [40.188 µs 40.327 µs 40.459 µs] change: [+3.2490% +3.4765% +3.7130%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe mat500_mul_mat500 time: [4.3909 ms 4.3944 ms 4.3978 ms] change: [+0.8556% +0.9519% +1.0448%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) low severe 2 (2.00%) high mild 1 (1.00%) high severe iter time: [840.01 µs 840.39 µs 840.81 µs] change: [+10.527% +10.726% +10.915%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) high mild 11 (11.00%) high severe iter_rev time: [210.14 µs 211.10 µs 212.84 µs] change: [+0.2455% +0.7119% +1.7846%] (p = 0.02 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe copy_from time: [199.77 µs 200.80 µs 202.55 µs] change: [+41.195% +41.962% +43.287%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 8 (8.00%) low mild 1 (1.00%) high severe axpy time: [31.301 µs 33.301 µs 34.957 µs] change: [+40.726% +52.001% +63.112%] (p = 0.00 < 0.05) Performance has regressed. tr_mul_to time: [126.46 µs 127.12 µs 128.09 µs] change: [−4.0124% −3.5145% −2.7708%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe mat_mul_mat time: [39.252 µs 39.443 µs 39.626 µs] change: [−0.7084% −0.3800% −0.0130%] (p = 0.02 < 0.05) Change within noise threshold. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low mild 8 (8.00%) high mild 2 (2.00%) high severe mat100_from_fn time: [6.8398 µs 6.8418 µs 6.8446 µs] change: [+519.35% +522.43% +524.76%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) high mild 9 (9.00%) high severe mat500_from_fn time: [172.11 µs 172.14 µs 172.18 µs] change: [+498.70% +499.32% +499.93%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low mild 5 (5.00%) high mild 7 (7.00%) high severe vec2_add_v_f32 time: [303.98 ps 304.76 ps 305.65 ps] change: [−5.1499% −4.3536% −3.5996%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 5 (5.00%) high mild 6 (6.00%) high severe vec3_add_v_f32 time: [586.36 ps 587.93 ps 589.92 ps] change: [+34.275% +34.886% +35.631%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 5 (5.00%) high mild 6 (6.00%) high severe vec4_add_v_f32 time: [603.45 ps 604.44 ps 605.59 ps] change: [−18.949% −18.215% −17.623%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe vec2_add_v_f64 time: [602.08 ps 602.83 ps 603.64 ps] change: [+89.139% +90.573% +91.808%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 4 (4.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe vec3_add_v_f64 time: [910.94 ps 912.60 ps 914.56 ps] change: [+107.10% +108.18% +109.41%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low severe 6 (6.00%) high mild 3 (3.00%) high severe vec4_add_v_f64 time: [1.1894 ns 1.1933 ns 1.1963 ns] change: [+82.607% +85.023% +86.911%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 9 (9.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe vec2_sub_v time: [303.45 ps 304.42 ps 305.37 ps] change: [−5.3598% −4.4578% −3.6738%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 8 (8.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe vec3_sub_v time: [672.95 ps 674.82 ps 676.51 ps] change: [+51.463% +52.336% +53.346%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe vec4_sub_v time: [602.84 ps 604.65 ps 607.70 ps] change: [−19.744% −18.754% −17.881%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe vec2_mul_s time: [666.49 ps 667.29 ps 668.31 ps] change: [+111.37% +111.81% +112.32%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 4 (4.00%) low severe 6 (6.00%) high mild 6 (6.00%) high severe vec3_mul_s time: [511.42 ps 513.44 ps 515.86 ps] change: [+15.556% +16.273% +17.049%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe vec4_mul_s time: [774.13 ps 775.22 ps 776.52 ps] change: [+5.1602% +5.5545% +6.0225%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 7 (7.00%) high severe vec2_div_s time: [1.3658 ns 1.3694 ns 1.3726 ns] change: [+328.67% +329.83% +331.09%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe vec3_div_s time: [607.73 ps 608.63 ps 609.66 ps] change: [+37.642% +38.017% +38.440%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 8 (8.00%) high mild 6 (6.00%) high severe vec4_div_s time: [802.59 ps 803.62 ps 804.82 ps] change: [+8.9451% +9.3240% +9.7149%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) low severe 6 (6.00%) high mild 2 (2.00%) high severe vec2_dot_f32 time: [461.20 ps 461.73 ps 462.30 ps] change: [+117.88% +119.27% +120.79%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 9 (9.00%) high severe vec3_dot_f32 time: [688.24 ps 689.05 ps 689.95 ps] change: [+225.49% +227.19% +229.16%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe vec4_dot_f32 time: [917.20 ps 921.23 ps 928.57 ps] change: [+338.59% +341.30% +344.17%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 8 (8.00%) high mild 5 (5.00%) high severe vec2_dot_f64 time: [596.11 ps 597.51 ps 598.79 ps] change: [+177.79% +179.60% +182.13%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe vec3_dot_f64 time: [749.32 ps 751.02 ps 752.81 ps] change: [+253.48% +257.12% +262.11%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe vec4_dot_f64 time: [1.0145 ns 1.0185 ns 1.0230 ns] change: [+376.34% +379.47% +383.46%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe vec3_cross time: [971.01 ps 971.87 ps 972.73 ps] change: [+122.34% +122.74% +123.17%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe vec2_norm time: [1.0612 ns 1.0623 ns 1.0637 ns] change: [−0.0722% +0.0499% +0.1765%] (p = 0.44 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) low mild 2 (2.00%) high severe vec3_norm time: [1.0649 ns 1.0665 ns 1.0694 ns] change: [−4.3787% −4.1856% −3.8679%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe vec4_norm time: [1.0733 ns 1.0739 ns 1.0746 ns] change: [−4.5616% −3.9738% −2.9157%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 2 (2.00%) low severe 7 (7.00%) low mild 5 (5.00%) high mild 5 (5.00%) high severe vec2_normalize time: [2.5310 ns 2.5326 ns 2.5345 ns] change: [+3.5769% +3.6696% +3.7678%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe vec3_normalize time: [2.5389 ns 2.5409 ns 2.5424 ns] change: [+1.1411% +1.2860% +1.4910%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe vec4_normalize time: [1.8154 ns 1.8164 ns 1.8173 ns] change: [−1.1191% −0.9926% −0.8485%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe vec10000_dot_f64 time: [2.0296 µs 2.0337 µs 2.0383 µs] change: [+71.107% +72.619% +74.228%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 3 (3.00%) high mild 4 (4.00%) high severe vec10000_dot_f32 time: [1.1891 µs 1.1926 µs 1.1962 µs] change: [+6.3585% +7.1059% +7.9357%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 6 (6.00%) high severe vec10000_axpy_f64 time: [2.0702 µs 2.0739 µs 2.0777 µs] change: [+39.373% +40.227% +41.210%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe vec10000_axpy_beta_f64 time: [2.0914 µs 2.0962 µs 2.1012 µs] change: [+31.958% +32.843% +33.467%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) low severe 5 (5.00%) high mild 2 (2.00%) high severe vec10000_axpy_f64_slice time: [2.0272 µs 2.0303 µs 2.0335 µs] change: [+35.880% +36.621% +37.307%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) low severe 2 (2.00%) high mild 1 (1.00%) high severe vec10000_axpy_f64_static time: [13.917 µs 13.965 µs 14.005 µs] change: [+859.61% +869.73% +879.35%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe vec10000_axpy_f32 time: [1.0402 µs 1.0421 µs 1.0437 µs] change: [+38.710% +39.603% +40.363%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe vec10000_axpy_beta_f32 time: [1.0329 µs 1.0346 µs 1.0364 µs] change: [+30.705% +31.490% +32.040%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe quaternion_add_q time: [642.58 ps 650.39 ps 662.45 ps] change: [−11.788% −10.934% −9.9463%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 6 (6.00%) high severe quaternion_sub_q time: [641.16 ps 643.22 ps 645.88 ps] change: [−12.654% −11.822% −10.943%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 5 (5.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 4 (4.00%) high severe quaternion_mul_q time: [1.4252 ns 1.4271 ns 1.4294 ns] change: [+94.545% +95.022% +95.499%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe unit_quaternion_mul_v time: [1.4859 ns 1.4874 ns 1.4890 ns] change: [+242.77% +243.56% +244.31%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild single_unit_quaternion_mul_v time: [1.0422 ns 1.0457 ns 1.0504 ns] Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 4 (4.00%) high mild 4 (4.00%) high severe quaternion_mul_s time: [771.17 ps 772.18 ps 773.37 ps] change: [+6.1278% +6.4276% +6.7583%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) low mild 3 (3.00%) high mild 3 (3.00%) high severe quaternion_div_s time: [798.54 ps 799.82 ps 801.43 ps] change: [+9.2123% +9.7287% +10.338%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 2 (2.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe quaternion_inv time: [1.2401 ns 1.2408 ns 1.2417 ns] change: [−43.660% −43.521% −43.317%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 2 (2.00%) low severe 5 (5.00%) high mild 6 (6.00%) high severe unit_quaternion_inv time: [596.01 ps 598.93 ps 602.66 ps] change: [−49.707% −49.184% −48.445%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 6 (6.00%) high mild 9 (9.00%) high severe quaternion_conjugate time: [604.36 ps 608.60 ps 613.48 ps] Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) high mild 9 (9.00%) high severe quaternion_normalize time: [1.8268 ns 1.8274 ns 1.8281 ns] Found 18 outliers among 100 measurements (18.00%) 4 (4.00%) low severe 4 (4.00%) low mild 7 (7.00%) high mild 3 (3.00%) high severe bidiagonalize_100x100 time: [265.91 µs 266.00 µs 266.11 µs] change: [+0.7553% +0.8363% +0.9114%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe bidiagonalize_100x500 time: [2.0053 ms 2.0060 ms 2.0065 ms] change: [+4.0325% +4.2372% +4.3938%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) low severe 2 (2.00%) high mild 5 (5.00%) high severe bidiagonalize_4x4 time: [266.92 ns 267.24 ns 267.62 ns] change: [+7.1063% +7.2057% +7.3231%] (p = 0.00 < 0.05) Performance has regressed. Found 23 outliers among 100 measurements (23.00%) 1 (1.00%) low severe 5 (5.00%) low mild 13 (13.00%) high mild 4 (4.00%) high severe Benchmarking bidiagonalize_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50. bidiagonalize_500x100 time: [1.6781 ms 1.6793 ms 1.6804 ms] change: [+1.3944% +1.5312% +1.6400%] (p = 0.00 < 0.05) Performance has regressed. bidiagonalize_unpack_100x100 time: [522.13 µs 522.36 µs 522.63 µs] change: [−0.5318% −0.4044% −0.2627%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 4 (4.00%) high mild 7 (7.00%) high severe bidiagonalize_unpack_100x500 time: [2.9858 ms 2.9916 ms 2.9976 ms] change: [−0.7824% −0.3995% −0.0370%] (p = 0.04 < 0.05) Change within noise threshold. bidiagonalize_unpack_500x100 time: [2.5884 ms 2.5896 ms 2.5910 ms] change: [+0.0767% +0.1539% +0.2316%] (p = 0.00 < 0.05) Change within noise threshold. cholesky_100x100 time: [31.084 µs 31.101 µs 31.122 µs] change: [−5.0365% −4.7949% −4.4205%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) low severe 4 (4.00%) low mild 1 (1.00%) high mild 9 (9.00%) high severe cholesky_500x500 time: [4.4799 ms 4.4849 ms 4.4903 ms] change: [−0.5985% −0.3685% −0.1374%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe cholesky_decompose_unpack_100x100 time: [31.659 µs 31.685 µs 31.727 µs] change: [−4.9712% −4.7445% −4.3325%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe cholesky_decompose_unpack_500x500 time: [4.4795 ms 4.4845 ms 4.4910 ms] change: [−1.9595% −1.7121% −1.4978%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 7 (7.00%) high severe cholesky_solve_10x10 time: [170.70 ns 170.76 ns 170.82 ns] change: [+8.0936% +8.1777% +8.2764%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe cholesky_solve_100x100 time: [2.9071 µs 2.9117 µs 2.9174 µs] change: [+8.4770% +8.9956% +9.6254%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe cholesky_solve_500x500 time: [54.193 µs 54.303 µs 54.417 µs] change: [+3.9332% +4.1755% +4.4477%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild cholesky_inverse_10x10 time: [1.3189 µs 1.3195 µs 1.3201 µs] change: [+2.5360% +2.6238% +2.7131%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe cholesky_inverse_100x100 time: [270.85 µs 270.88 µs 270.92 µs] change: [−0.9726% −0.8524% −0.7319%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low severe 4 (4.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe cholesky_inverse_500x500 time: [26.673 ms 26.694 ms 26.714 ms] change: [+1.0784% +1.1816% +1.2794%] (p = 0.00 < 0.05) Performance has regressed. Found 23 outliers among 100 measurements (23.00%) 19 (19.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe full_piv_lu_decompose_10x10 time: [582.31 ns 582.48 ns 582.67 ns] change: [+19.583% +19.702% +19.795%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 6 (6.00%) high mild 2 (2.00%) high severe full_piv_lu_decompose_100x100 time: [218.73 µs 218.78 µs 218.84 µs] change: [+5.8729% +5.9828% +6.0904%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low severe 5 (5.00%) low mild 1 (1.00%) high severe full_piv_lu_solve_10x10 time: [124.88 ns 124.94 ns 125.02 ns] change: [+7.4724% +7.6252% +7.7787%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low severe 6 (6.00%) high mild 4 (4.00%) high severe full_piv_lu_solve_100x100 time: [2.5202 µs 2.5244 µs 2.5289 µs] change: [+11.226% +11.847% +12.518%] (p = 0.00 < 0.05) Performance has regressed. Found 17 outliers among 100 measurements (17.00%) 14 (14.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild full_piv_lu_inverse_10x10 time: [869.61 ns 870.27 ns 871.19 ns] change: [+4.7996% +4.9224% +5.0608%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low severe 1 (1.00%) high mild 4 (4.00%) high severe full_piv_lu_inverse_100x100 time: [212.68 µs 212.83 µs 213.05 µs] change: [−0.2835% −0.0351% +0.1310%] (p = 0.80 > 0.05) No change in performance detected. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 4 (4.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe full_piv_lu_determinant_10x10 time: [15.320 ns 15.338 ns 15.357 ns] change: [+410.70% +421.41% +430.41%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 9 (9.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild full_piv_lu_determinant_100x100 time: [137.44 ns 139.37 ns 141.00 ns] change: [+213.54% +227.75% +241.42%] (p = 0.00 < 0.05) Performance has regressed. hessenberg_decompose_4x4 time: [82.510 ns 82.538 ns 82.564 ns] change: [−27.950% −27.887% −27.830%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild hessenberg_decompose_100x100 time: [295.98 µs 296.16 µs 296.44 µs] change: [+3.3234% +3.5705% +3.7986%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe hessenberg_decompose_200x200 time: [2.2647 ms 2.2681 ms 2.2714 ms] change: [+4.8426% +4.9983% +5.1646%] (p = 0.00 < 0.05) Performance has regressed. hessenberg_decompose_unpack_100x100 time: [435.30 µs 435.75 µs 436.12 µs] change: [+2.7479% +2.8420% +2.9424%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe hessenberg_decompose_unpack_200x200 time: [3.2667 ms 3.2678 ms 3.2690 ms] change: [+3.9624% +4.0021% +4.0423%] (p = 0.00 < 0.05) Performance has regressed. Found 22 outliers among 100 measurements (22.00%) 13 (13.00%) low severe 1 (1.00%) low mild 3 (3.00%) high mild 5 (5.00%) high severe lu_decompose_10x10 time: [353.04 ns 353.16 ns 353.31 ns] change: [−5.0408% −4.9435% −4.8487%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 4 (4.00%) low severe 4 (4.00%) low mild 6 (6.00%) high mild 5 (5.00%) high severe lu_decompose_100x100 time: [71.544 µs 71.560 µs 71.579 µs] change: [−1.7176% −1.6430% −1.5721%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severe lu_solve_10x10 time: [115.42 ns 115.52 ns 115.61 ns] change: [+3.9363% +4.1024% +4.2557%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 4 (4.00%) low severe 8 (8.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe lu_solve_100x100 time: [2.5152 µs 2.5190 µs 2.5225 µs] change: [+15.120% +15.625% +16.088%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild lu_inverse_10x10 time: [902.55 ns 903.32 ns 903.97 ns] change: [+0.7407% +0.8734% +1.0263%] (p = 0.00 < 0.05) Change within noise threshold. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high severe lu_inverse_100x100 time: [216.21 µs 216.47 µs 216.80 µs] change: [−0.6663% −0.5584% −0.4316%] (p = 0.00 < 0.05) Change within noise threshold. Found 18 outliers among 100 measurements (18.00%) 2 (2.00%) low severe 4 (4.00%) low mild 5 (5.00%) high mild 7 (7.00%) high severe lu_determinant_10x10 time: [13.394 ns 13.481 ns 13.665 ns] change: [+508.98% +524.96% +543.53%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 6 (6.00%) low severe 1 (1.00%) low mild 5 (5.00%) high mild 2 (2.00%) high severe lu_determinant_100x100 time: [149.12 ns 150.16 ns 151.08 ns] change: [+265.69% +281.86% +296.23%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 10 (10.00%) low severe 4 (4.00%) low mild qr_decompose_100x100 time: [141.62 µs 141.65 µs 141.69 µs] change: [+0.6391% +0.8447% +0.9784%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe Benchmarking qr_decompose_100x500: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60. qr_decompose_100x500 time: [1.0071 ms 1.0082 ms 1.0097 ms] change: [+0.9031% +1.2358% +1.6126%] (p = 0.00 < 0.05) Change within noise threshold. Found 16 outliers among 100 measurements (16.00%) 12 (12.00%) low mild 2 (2.00%) high mild 2 (2.00%) high severe qr_decompose_4x4 time: [100.40 ns 100.43 ns 100.45 ns] change: [−19.315% −19.268% −19.224%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 1 (1.00%) high mild 4 (4.00%) high severe Benchmarking qr_decompose_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60. qr_decompose_500x100 time: [847.17 µs 847.68 µs 848.21 µs] change: [+2.1441% +2.3425% +2.5069%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe qr_decompose_unpack_100x100 time: [283.22 µs 283.26 µs 283.30 µs] change: [−0.3591% −0.2383% −0.1147%] (p = 0.00 < 0.05) Change within noise threshold. Found 23 outliers among 100 measurements (23.00%) 21 (21.00%) low severe 1 (1.00%) low mild 1 (1.00%) high severe Benchmarking qr_decompose_unpack_100x500: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60. qr_decompose_unpack_100x500 time: [1.1399 ms 1.1429 ms 1.1457 ms] change: [−1.9555% −1.8085% −1.6312%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild Benchmarking qr_decompose_unpack_500x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.6s, enable flat sampling, or reduce sample count to 50. qr_decompose_unpack_500x100 time: [1.6633 ms 1.6640 ms 1.6648 ms] change: [+1.4516% +1.5245% +1.5969%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low severe 5 (5.00%) low mild 4 (4.00%) high severe qr_solve_10x10 time: [156.51 ns 156.56 ns 156.61 ns] change: [+3.7415% +3.8709% +3.9947%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) low severe 5 (5.00%) low mild 1 (1.00%) high mild qr_solve_100x100 time: [3.5393 µs 3.5454 µs 3.5511 µs] change: [+6.0908% +6.5747% +6.9798%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) low mild qr_inverse_10x10 time: [806.75 ns 807.99 ns 809.61 ns] change: [+0.6973% +0.8242% +0.9558%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe qr_inverse_100x100 time: [330.65 µs 330.74 µs 330.85 µs] change: [+1.2238% +1.3244% +1.4518%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe schur_decompose_4x4 time: [969.14 ns 969.71 ns 970.18 ns] change: [−12.293% −12.223% −12.149%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) low severe 1 (1.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe schur_decompose_10x10 time: [7.3226 µs 7.3237 µs 7.3247 µs] change: [+0.3785% +0.4095% +0.4394%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) low mild 4 (4.00%) high mild 3 (3.00%) high severe schur_decompose_100x100 time: [2.5760 ms 2.5763 ms 2.5768 ms] change: [+0.7992% +0.8504% +0.8935%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe schur_decompose_200x200 time: [18.285 ms 18.296 ms 18.308 ms] change: [+1.9360% +2.0941% +2.2427%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe eigenvalues_4x4 time: [937.94 ns 938.15 ns 938.38 ns] change: [+25.764% +25.898% +26.023%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild eigenvalues_10x10 time: [5.9066 µs 5.9088 µs 5.9117 µs] change: [+0.1208% +0.1938% +0.2740%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 3 (3.00%) high mild 4 (4.00%) high severe Benchmarking eigenvalues_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. eigenvalues_100x100 time: [1.5870 ms 1.5873 ms 1.5876 ms] change: [−0.8569% −0.8247% −0.7914%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe eigenvalues_200x200 time: [11.081 ms 11.088 ms 11.102 ms] change: [+0.0054% +0.2956% +0.4946%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe solve_l_triangular_100x100 time: [1.3250 µs 1.3651 µs 1.4012 µs] change: [+22.932% +24.999% +27.087%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 10 (10.00%) high mild 2 (2.00%) high severe solve_l_triangular_1000x1000 time: [101.52 µs 102.04 µs 102.85 µs] change: [+1.5784% +2.0953% +2.8471%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 9 (9.00%) high mild 6 (6.00%) high severe tr_solve_l_triangular_100x100 time: [2.0144 µs 2.0537 µs 2.0902 µs] change: [+13.600% +14.669% +15.998%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe tr_solve_l_triangular_1000x1000 time: [93.569 µs 94.056 µs 94.857 µs] change: [+1.2474% +1.7955% +2.5979%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe solve_u_triangular_100x100 time: [1.5878 µs 1.6615 µs 1.7405 µs] change: [+31.200% +34.370% +38.132%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 10 (10.00%) high mild 3 (3.00%) high severe solve_u_triangular_1000x1000 time: [105.07 µs 105.46 µs 106.12 µs] change: [+6.6559% +7.0936% +7.8401%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe tr_solve_u_triangular_100x100 time: [1.4369 µs 1.4697 µs 1.4986 µs] change: [+17.195% +18.687% +20.307%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 11 (11.00%) high mild 2 (2.00%) high severe tr_solve_u_triangular_1000x1000 time: [88.868 µs 89.303 µs 90.014 µs] change: [+4.2489% +4.7933% +5.6045%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe svd_decompose_2x2 time: [22.913 ns 22.958 ns 23.017 ns] change: [+9.3648% +9.7443% +10.253%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe svd_decompose_3x3 time: [359.30 ns 359.72 ns 360.20 ns] change: [+9.0123% +9.1174% +9.2394%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild svd_decompose_4x4 time: [896.28 ns 896.55 ns 896.85 ns] change: [−7.1192% −7.0496% −6.9853%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) low severe 3 (3.00%) low mild 3 (3.00%) high mild 2 (2.00%) high severe svd_decompose_10x10 time: [5.7680 µs 5.7708 µs 5.7739 µs] change: [+1.1933% +1.4155% +1.6347%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe Benchmarking svd_decompose_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50. svd_decompose_100x100 time: [1.5704 ms 1.5709 ms 1.5715 ms] change: [+1.4465% +1.4891% +1.5357%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe svd_decompose_200x200 time: [11.845 ms 11.847 ms 11.850 ms] change: [+1.4378% +1.4794% +1.5225%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high severe rank_4x4 time: [716.49 ns 716.62 ns 716.74 ns] change: [+4.9084% +4.9678% +5.0237%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) low mild rank_10x10 time: [4.2304 µs 4.2341 µs 4.2377 µs] change: [+0.4993% +0.6056% +0.7271%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild rank_100x100 time: [522.74 µs 522.85 µs 522.97 µs] change: [+0.2822% +0.3170% +0.3535%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 2 (2.00%) high severe rank_200x200 time: [3.0167 ms 3.0217 ms 3.0267 ms] change: [+0.3924% +0.5333% +0.6946%] (p = 0.00 < 0.05) Change within noise threshold. singular_values_4x4 time: [735.97 ns 736.08 ns 736.21 ns] change: [−7.6736% −7.6163% −7.5596%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low severe 2 (2.00%) low mild 2 (2.00%) high severe singular_values_10x10 time: [4.2987 µs 4.2997 µs 4.3010 µs] change: [+1.6193% +1.7215% +1.8186%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe singular_values_100x100 time: [525.20 µs 525.36 µs 525.54 µs] change: [+0.4054% +0.4526% +0.4982%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe singular_values_200x200 time: [3.0712 ms 3.0729 ms 3.0750 ms] change: [+2.1769% +2.2358% +2.3112%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe pseudo_inverse_4x4 time: [877.64 ns 878.38 ns 879.12 ns] change: [−8.2828% −8.2216% −8.1662%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 1 (1.00%) low severe 3 (3.00%) low mild 2 (2.00%) high mild 7 (7.00%) high severe pseudo_inverse_10x10 time: [6.0008 µs 6.0034 µs 6.0064 µs] change: [+0.2665% +0.3678% +0.4766%] (p = 0.00 < 0.05) Change within noise threshold. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe Benchmarking pseudo_inverse_100x100: Warming up for 3.0000 s Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50. pseudo_inverse_100x100 time: [1.6088 ms 1.6091 ms 1.6094 ms] change: [+0.1161% +0.2007% +0.2937%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe pseudo_inverse_200x200 time: [12.038 ms 12.042 ms 12.047 ms] change: [−0.4351% −0.2531% −0.0699%] (p = 0.01 < 0.05) Change within noise threshold. Found 22 outliers among 100 measurements (22.00%) 16 (16.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe symmetric_eigen_decompose_4x4 time: [518.00 ns 518.07 ns 518.15 ns] change: [+4.7008% +4.7492% +4.8006%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) low mild 2 (2.00%) high mild 4 (4.00%) high severe symmetric_eigen_decompose_10x10 time: [3.6417 µs 3.6428 µs 3.6440 µs] change: [−0.1549% −0.0998% −0.0483%] (p = 0.00 < 0.05) Change within noise threshold. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe symmetric_eigen_decompose_100x100 time: [761.64 µs 762.66 µs 763.80 µs] change: [−5.8109% −5.7178% −5.6284%] (p = 0.00 < 0.05) Performance has improved. Found 19 outliers among 100 measurements (19.00%) 9 (9.00%) low severe 9 (9.00%) low mild 1 (1.00%) high severe symmetric_eigen_decompose_200x200 time: [5.1304 ms 5.1337 ms 5.1372 ms] change: [−9.4434% −9.3646% −9.2959%] (p = 0.00 < 0.05) Performance has improved. Total run time of full benchmark suite on my machine (AMD 5950X) has not changed and is still around ~30 minutes.

Some algorithms may not converge when used on completely random values with the default value of epsilon and unlimited iterations. `reproducible_dmatrix()` already exist to circumvent this for `DMatrix`, so I implemented the same for `SMatrix`. In my tests this problem manifested itself only on `schur_decompose_4x4`, but I decided to apply similar fix for all benchmarks that also use `reproducible_dmatrix()` for `DMatrix`.

Random matrices may be not positive-definite and Cholesky decomposition benchmarks panic because of that: Benchmarking cholesky_decompose_unpack_100x100: Warming up for 3.0000 s thread 'main' panicked at benches/linalg/cholesky.rs:38:45: called `Option::unwrap()` on a `None` value

geo-ant · 2025-09-30T07:02:00Z

Hey @im-0, sorry GitHub mobile is not letting me provide an actual review. Thanks for the effort, that's very exhaustive benchmarking now. The one thing I stumbled over is that sometimes matrices with constant values are generated ('from_slice', 'from_element') rather than random values. This seems to be a bit inconsistent and I think I'd prefer consistent random value generation. Other than that this looks great to me.

I've also asked the faer maintainer Sarah-ek to have a look at this. They might have some valuable input as well. But to me everything looks good, except the inconsistency with the constant values.

geo-ant

overall fantastic, I just had some questions on the use of reproducible matrix and some leftover constant vectors. Plus one remark on the cholesky test.

geo-ant · 2025-10-01T05:43:18Z

benches/linalg/cholesky.rs

+    bh.bench_function("cholesky_100x100", |bh| {
+        bh.iter_batched(
+            || {
+                let m = crate::reproducible_dmatrix(100, 100);


I have a suspicion why this calls the reproducible matrix and I think there's a problem. Let me explain. For a Cholesky decomposition of a matrix A to be defined, we need the matrix to be symmetric positive definite. That's actually why the line let m = &m * m.transpose() exists in the old test, but it's still wrong. To create a symmetric positive semidefinite matrix, it's okay to calculate A A^T, but this might still be singular. A numerically stable way to create an actually positive definite matrix from that is to calculate A A^T + alpha * Id with Id the identity matrix and alpha chosen for numerical stability. An alpha that works is e.g. f64::EPSILON * A.norm_squared(). I know this because I had to fix that exact problem in the nalgebra-lapack proptests recently, see https://github.com/dimforge/nalgebra/blob/main/nalgebra-lapack/tests/linalg/cholesky.rs, specifically the positive_definite_dmatrix function.

geo-ant · 2025-10-01T05:43:33Z

benches/linalg/cholesky.rs

+    bh.bench_function("cholesky_500x500", |bh| {
+        bh.iter_batched(
+            || {
+                let m = crate::reproducible_dmatrix(500, 500);


see the 100x100 test

geo-ant · 2025-10-01T05:43:44Z

benches/linalg/cholesky.rs

+    bh.bench_function("cholesky_decompose_unpack_100x100", |bh| {
+        bh.iter_batched(
+            || {
+                let m = crate::reproducible_dmatrix(100, 100);


see the 100x100 test

geo-ant · 2025-10-01T05:43:53Z

benches/linalg/cholesky.rs

+    bh.bench_function("cholesky_decompose_unpack_500x500", |bh| {
+        bh.iter_batched(
+            || {
+                let m = crate::reproducible_dmatrix(500, 500);


see the 100x100 test

geo-ant · 2025-10-01T05:44:06Z

benches/linalg/cholesky.rs

+    bh.bench_function("cholesky_solve_10x10", |bh| {
+        bh.iter_batched_ref(
+            || {
+                let m = crate::reproducible_dmatrix(10, 10);


see the 100x100 test

geo-ant · 2025-10-01T05:49:24Z

benches/linalg/qr.rs

+        bh.iter_batched(
+            || {
+                let m = DMatrix::<f64>::new_random(10, 10);
+                (QR::new(m), DVector::<f64>::from_element(10, 1.0))


non-random-vector

geo-ant · 2025-10-01T05:50:19Z

benches/linalg/qr.rs

+        bh.iter_batched(
+            || {
+                let m = DMatrix::<f64>::new_random(100, 100);
+                (QR::new(m), DVector::<f64>::from_element(100, 1.0))


non-random-vector

geo-ant · 2025-10-01T05:52:15Z

benches/linalg/schur.rs

-        bh.iter(|| std::hint::black_box(Schur::new(m.clone())))
+    bh.bench_function("schur_decompose_4x4", |bh| {
+        bh.iter_batched(
+            || crate::reproducible_smatrix::<f64, 4, 4>(),


why is the reproducible matrix called here? I'm not so familiar with the Schur decomposition, but from a cursory glance at wikipedia, any square real matrix should have one. Same question for the other instances of the test below.

geo-ant · 2025-10-01T05:53:15Z

benches/linalg/schur.rs

-        bh.iter(|| std::hint::black_box(m.complex_eigenvalues()))
+    bh.bench_function("eigenvalues_4x4", |bh| {
+        bh.iter_batched_ref(
+            || crate::reproducible_smatrix::<f64, 4, 4>(),


same question as above, why cal the reproducible matrix here instead of a random one?

geo-ant · 2025-10-01T05:53:52Z

benches/linalg/svd.rs

-        bh.iter(|| std::hint::black_box(SVD::new_unordered(m.clone(), true, true)))
+    bh.bench_function("svd_decompose_2x2", |bh| {
+        bh.iter_batched(
+            || crate::reproducible_smatrix::<f32, 2, 2>(),


why use the reproducible matrix here? Same for the instances below

…ly generated positive definite matrix

…ls and replace with random

geo-ant · 2025-10-02T04:21:07Z

hey @im-0, I've implemented the changes myself, because I felt I was bothering you unduly. Please let me know if you agree with those and then I think we can get this merged.

im-0 · 2025-10-06T14:41:53Z

Was busy with other things. I will check this later today or tomorrow.

I think that at least for some algorithms it will be better to use a predictable sequence of random matrices instead of completely random values on each benchmark run. But I am not completely sure about this and need to check the actual implementation...

geo-ant · 2025-10-06T17:44:45Z

@im-0 please feel free to implement changes as you see fit. I think this will be the last iteration. The one thing I'm wondering is whether the 'reproduciple_matrix' actually produces a random sequence of matrices or whether it seeds the rng on each call. I'm on mobile right now, so I don't have the code at hand.

geo-ant · 2025-10-06T20:31:57Z

@im-0 UPDATE: I've looked at the code and each call to reproducible_dmatrix seeds the rng with 0 again. I've also written a little test program on my local PC to verify, just in case.

That means the sequence of random numbers will always be the same for each call. So two matrices of the same size created with repdroducible_dmatrix will be identical. This actually makes me feel that using it in benchmarks is actually not very good, since it will test exactly the behavior for this specific matrix. Assuming there even are differences depending on the contents of the matrices, this is not what we want, I don' t think.

geo-ant · 2025-10-12T18:16:37Z

@im-0, not trying to rush you. I just know you had some thoughts about whether you are happy with this PR to get merged. Let me know if you are fine to proceed.

im-0 mentioned this pull request Sep 24, 2025

Existing microbenchmarks measure only the outval.clone() and not an actual computation #1547

Open

im-0 force-pushed the fix-benchmarks branch from 8338da6 to 9d1c4ef Compare September 24, 2025 15:14

geo-ant reviewed Sep 27, 2025

View reviewed changes

im-0 added 9 commits September 30, 2025 02:20

chore: Bump criterion to version 0.7

6452c24

chore: Remove unused bench_binop_fn!(), bench_unop_na!() and bench_co…

25a9445

…nstruction!()

chore: Uncomment quaternion benchmarks

65302fa

I do not know why those benchmarks were commented out.

im-0 force-pushed the fix-benchmarks branch from 9d1c4ef to cc7f108 Compare September 30, 2025 00:30

im-0 requested a review from geo-ant September 30, 2025 00:54

geo-ant requested changes Oct 1, 2025

View reviewed changes

geo-ant added 3 commits October 2, 2025 05:53

don't require reproducible matrix for cholesky and make it use random…

915606f

…ly generated positive definite matrix

remove constant elements where useful, remove reproducible matrix cal…

8168020

…ls and replace with random

fix wrong test name

4e1df75

update changelog

ebce7ea

Merge branch 'main' into fix-benchmarks

3f19d20

Uh oh!

Make benchmarks measure an actual computation #1549

Are you sure you want to change the base?

Make benchmarks measure an actual computation #1549

Uh oh!

Conversation

im-0 commented Sep 24, 2025

Uh oh!

im-0 commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geo-ant left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geo-ant Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geo-ant commented Sep 30, 2025

Uh oh!

geo-ant left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geo-ant commented Oct 2, 2025

Uh oh!

im-0 commented Oct 6, 2025

Uh oh!

geo-ant commented Oct 6, 2025

Uh oh!

geo-ant commented Oct 6, 2025

Uh oh!

geo-ant commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

im-0 commented Sep 24, 2025 •

edited

Loading

geo-ant Sep 28, 2025 •

edited

Loading