Skip to content

Conversation

im-0
Copy link
Contributor

@im-0 im-0 commented Sep 24, 2025

For details see: #1547

Before/after comparison on AMD Ryzen 9 5950X:

click for details...

mat2_mul_m              time:   [1.8211 ns 1.8224 ns 1.8239 ns]
                        change: [+146.50% +146.66% +146.86%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  9 (9.00%) high severe

mat3_mul_m              time:   [10.085 ns 10.090 ns 10.096 ns]
                        change: [+531.16% +531.50% +531.97%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

mat4_mul_m              time:   [11.219 ns 11.234 ns 11.250 ns]
                        change: [+277.89% +278.45% +278.98%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 18 outliers among 100 measurements (18.00%)
  9 (9.00%) low mild
  7 (7.00%) high mild
  2 (2.00%) high severe

mat2_tr_mul_m           time:   [1.7146 ns 1.7154 ns 1.7161 ns]
                        change: [+131.55% +131.93% +132.26%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

mat3_tr_mul_m           time:   [9.7604 ns 9.7655 ns 9.7713 ns]
                        change: [+513.89% +514.65% +515.31%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 21 outliers among 100 measurements (21.00%)
  4 (4.00%) high mild
  17 (17.00%) high severe

mat4_tr_mul_m           time:   [9.0668 ns 9.0724 ns 9.0786 ns]
                        change: [+206.70% +206.94% +207.18%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  5 (5.00%) high mild
  3 (3.00%) high severe

mat2_add_m              time:   [1.7747 ns 1.7768 ns 1.7794 ns]
                        change: [+139.74% +140.01% +140.28%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe

mat3_add_m              time:   [4.1347 ns 4.1397 ns 4.1450 ns]
                        change: [+158.49% +158.79% +159.11%] (p = 0.00 < 0.05)
                        Performance has regressed.

mat4_add_m              time:   [6.3138 ns 6.3202 ns 6.3277 ns]
                        change: [+113.14% +113.45% +113.79%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

mat2_sub_m              time:   [1.7622 ns 1.7636 ns 1.7653 ns]
                        change: [+138.09% +138.34% +138.58%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

mat3_sub_m              time:   [4.1355 ns 4.1407 ns 4.1472 ns]
                        change: [+159.26% +159.58% +159.93%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

mat4_sub_m              time:   [6.3649 ns 6.3712 ns 6.3777 ns]
                        change: [+113.88% +114.05% +114.23%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

mat2_mul_v              time:   [1.8745 ns 1.8809 ns 1.8880 ns]
                        change: [+490.83% +492.86% +494.55%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

mat3_mul_v              time:   [8.7801 ns 8.7907 ns 8.8027 ns]
                        change: [+1890.3% +1894.0% +1898.5%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) high mild
  14 (14.00%) high severe

mat4_mul_v              time:   [2.7012 ns 2.7086 ns 2.7170 ns]
                        change: [+264.61% +265.53% +266.48%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

mat2_tr_mul_v           time:   [1.3577 ns 1.3579 ns 1.3582 ns]
                        change: [+325.94% +326.43% +326.83%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

mat3_tr_mul_v           time:   [2.3408 ns 2.3449 ns 2.3491 ns]
                        change: [+419.84% +420.66% +421.31%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

mat4_tr_mul_v           time:   [3.1961 ns 3.2026 ns 3.2100 ns]
                        change: [+329.36% +330.71% +332.28%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

mat2_mul_s              time:   [1.5770 ns 1.5804 ns 1.5846 ns]
                        change: [+112.05% +112.90% +113.84%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) high mild
  15 (15.00%) high severe

mat3_mul_s              time:   [3.2606 ns 3.2749 ns 3.2909 ns]
                        change: [+105.41% +106.09% +106.85%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  6 (6.00%) high mild
  11 (11.00%) high severe

mat4_mul_s              time:   [5.3422 ns 5.3465 ns 5.3512 ns]
                        change: [+80.678% +80.813% +80.952%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

mat2_div_s              time:   [1.6070 ns 1.6156 ns 1.6256 ns]
                        change: [+117.73% +118.67% +119.89%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 21 outliers among 100 measurements (21.00%)
  1 (1.00%) high mild
  20 (20.00%) high severe

mat3_div_s              time:   [3.3834 ns 3.3934 ns 3.4053 ns]
                        change: [+112.87% +113.44% +114.19%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) high mild
  14 (14.00%) high severe

mat4_div_s              time:   [5.6942 ns 5.6986 ns 5.7034 ns]
                        change: [+91.417% +91.588% +91.762%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

mat2_inv                time:   [1.8721 ns 1.8725 ns 1.8731 ns]
                        change: [+8.9164% +9.1056% +9.2598%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

mat3_inv                time:   [5.3171 ns 5.3242 ns 5.3315 ns]
                        change: [+2.0336% +2.1086% +2.1810%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high severe

mat4_inv                time:   [27.564 ns 27.591 ns 27.632 ns]
                        change: [-5.5949% -5.4858% -5.3888%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe

mat2_transpose          time:   [1.2243 ns 1.2248 ns 1.2258 ns]
                        change: [+9.6634% +9.7213% +9.7847%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

mat3_transpose          time:   [2.6247 ns 2.6261 ns 2.6276 ns]
                        change: [+3.2032% +3.2351% +3.2676%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  8 (8.00%) high severe

mat4_transpose          time:   [5.0910 ns 5.0925 ns 5.0938 ns]
                        change: [+3.4672% +3.5265% +3.5802%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

mat_div_scalar          time:   [621.16 µs 621.39 µs 621.68 µs]
                        change: [-0.9877% -0.8953% -0.8031%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

mat100_add_mat100       time:   [1.7506 µs 1.7515 µs 1.7526 µs]
                        change: [+0.7028% +0.7809% +0.8593%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

mat4_mul_mat4           time:   [33.744 ns 33.794 ns 33.898 ns]
                        change: [+17.549% +17.748% +18.055%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

mat5_mul_mat5           time:   [48.309 ns 48.319 ns 48.331 ns]
                        change: [-44.665% -44.512% -44.362%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

mat6_mul_mat6           time:   [77.366 ns 77.384 ns 77.405 ns]
                        change: [+0.4793% +0.5202% +0.5684%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

mat7_mul_mat7           time:   [83.373 ns 83.398 ns 83.430 ns]
                        change: [-0.3133% -0.1903% -0.0321%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

mat8_mul_mat8           time:   [69.337 ns 69.426 ns 69.532 ns]
                        change: [-0.6629% -0.2159% +0.0430%] (p = 0.34 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe

mat9_mul_mat9           time:   [187.68 ns 187.79 ns 187.89 ns]
                        change: [+0.3227% +0.3859% +0.4503%] (p = 0.00 < 0.05)
                        Change within noise threshold.

mat10_mul_mat10         time:   [199.87 ns 200.02 ns 200.18 ns]
                        change: [+2.5979% +2.6928% +2.7811%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild

mat10_mul_mat10_static  time:   [123.43 ns 123.48 ns 123.56 ns]
                        change: [+17.238% +17.404% +17.662%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

mat100_mul_mat100       time:   [39.498 µs 39.600 µs 39.717 µs]
                        change: [+0.5117% +0.6856% +0.8528%] (p = 0.00 < 0.05)
                        Change within noise threshold.

mat500_mul_mat500       time:   [4.3615 ms 4.3638 ms 4.3667 ms]
                        change: [-1.5042% -1.4307% -1.3514%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  13 (13.00%) high severe

iter                    time:   [851.64 µs 851.73 µs 851.84 µs]
                        change: [+10.980% +11.254% +11.506%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

iter_rev                time:   [212.95 µs 212.98 µs 213.02 µs]
                        change: [+0.2664% +0.5462% +0.7106%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high severe

copy_from               time:   [141.14 µs 141.31 µs 141.53 µs]
                        change: [-0.8809% -0.5821% -0.3187%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

axpy                    time:   [17.648 µs 17.693 µs 17.741 µs]
                        change: [-1.2897% -1.0169% -0.7049%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

tr_mul_to               time:   [136.20 µs 136.28 µs 136.36 µs]
                        change: [+0.5699% +1.3218% +1.7817%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 18 outliers among 100 measurements (18.00%)
  10 (10.00%) high mild
  8 (8.00%) high severe

mat_mul_mat             time:   [39.931 µs 39.970 µs 40.007 µs]
                        change: [+2.4660% +2.5525% +2.6388%] (p = 0.00 < 0.05)
                        Performance has regressed.

mat100_from_fn          time:   [6.8839 µs 6.8872 µs 6.8901 µs]
                        change: [+514.04% +514.93% +515.78%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

mat500_from_fn          time:   [173.41 µs 173.45 µs 173.49 µs]
                        change: [+500.13% +501.19% +502.54%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) low severe
  5 (5.00%) low mild
  3 (3.00%) high severe

vec2_add_v_f32          time:   [1.1836 ns 1.1840 ns 1.1843 ns]
                        change: [+273.42% +273.97% +274.44%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

vec3_add_v_f32          time:   [1.7895 ns 1.7906 ns 1.7918 ns]
                        change: [+304.64% +305.11% +305.60%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

vec4_add_v_f32          time:   [1.7916 ns 1.7944 ns 1.7979 ns]
                        change: [+143.09% +143.48% +143.88%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

vec2_add_v_f64          time:   [1.1722 ns 1.1734 ns 1.1748 ns]
                        change: [+269.25% +270.01% +270.79%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

vec3_add_v_f64          time:   [1.9394 ns 1.9423 ns 1.9449 ns]
                        change: [+331.48% +332.20% +332.90%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

vec4_add_v_f64          time:   [2.2729 ns 2.2761 ns 2.2795 ns]
                        change: [+252.84% +253.55% +254.30%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe

vec2_sub_v              time:   [1.2017 ns 1.2029 ns 1.2044 ns]
                        change: [+274.58% +275.29% +276.09%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

vec3_sub_v              time:   [1.7818 ns 1.7838 ns 1.7861 ns]
                        change: [+303.94% +304.55% +305.21%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild

vec4_sub_v              time:   [1.7921 ns 1.7936 ns 1.7951 ns]
                        change: [+140.58% +141.03% +141.57%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

vec2_mul_s              time:   [985.48 ps 985.59 ps 985.78 ps]
                        change: [+210.12% +210.25% +210.38%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

vec3_mul_s              time:   [1.4056 ns 1.4059 ns 1.4062 ns]
                        change: [+217.88% +218.05% +218.20%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe

vec4_mul_s              time:   [1.5828 ns 1.5839 ns 1.5850 ns]
                        change: [+114.04% +114.28% +114.53%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

vec2_div_s              time:   [1.4854 ns 1.4860 ns 1.4867 ns]
                        change: [+366.97% +367.15% +367.33%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

vec3_div_s              time:   [1.4821 ns 1.4832 ns 1.4848 ns]
                        change: [+229.77% +230.14% +230.45%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

vec4_div_s              time:   [1.6161 ns 1.6175 ns 1.6189 ns]
                        change: [+116.74% +117.02% +117.30%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

vec2_dot_f32            time:   [715.01 ps 717.55 ps 721.65 ps]
                        change: [+231.29% +232.12% +233.08%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

vec3_dot_f32            time:   [7.5379 ns 7.5393 ns 7.5412 ns]
                        change: [+3441.1% +3445.4% +3449.7%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  4 (4.00%) high mild
  12 (12.00%) high severe

vec4_dot_f32            time:   [1.1914 ns 1.1941 ns 1.1968 ns]
                        change: [+455.67% +456.70% +457.85%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

vec2_dot_f64            time:   [833.73 ps 834.75 ps 835.77 ps]
                        change: [+286.15% +286.81% +287.48%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

vec3_dot_f64            time:   [7.5031 ns 7.5143 ns 7.5302 ns]
                        change: [+3387.4% +3390.9% +3395.5%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

vec4_dot_f64            time:   [1.2572 ns 1.2582 ns 1.2593 ns]
                        change: [+481.66% +482.79% +483.69%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

vec3_cross              time:   [7.6360 ns 7.6371 ns 7.6383 ns]
                        change: [+1603.2% +1604.4% +1605.3%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
  8 (8.00%) high severe

vec2_norm               time:   [1.0934 ns 1.0936 ns 1.0939 ns]
                        change: [+0.4240% +0.4477% +0.4713%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

vec3_norm               time:   [1.1115 ns 1.1117 ns 1.1120 ns]
                        change: [-1.2543% -1.1951% -1.1429%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

vec4_norm               time:   [1.1119 ns 1.1121 ns 1.1124 ns]
                        change: [-2.7306% -2.5540% -2.4351%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high severe

vec2_normalize          time:   [2.4860 ns 2.4869 ns 2.4880 ns]
                        change: [+0.8836% +0.9584% +1.0264%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe

vec3_normalize          time:   [2.5838 ns 2.5843 ns 2.5850 ns]
                        change: [+2.4280% +2.5396% +2.6267%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

vec4_normalize          time:   [1.9319 ns 1.9321 ns 1.9323 ns]
                        change: [+3.0547% +3.1098% +3.1644%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  5 (5.00%) high severe

vec10000_dot_f64        time:   [2.4662 µs 2.4669 µs 2.4677 µs]
                        change: [+103.74% +103.80% +103.89%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

vec10000_dot_f32        time:   [1.7355 µs 1.7368 µs 1.7386 µs]
                        change: [+56.145% +56.539% +56.985%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

vec10000_axpy_f64       time:   [1.5285 µs 1.5289 µs 1.5293 µs]
                        change: [+1.5407% +1.5954% +1.6519%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

vec10000_axpy_beta_f64  time:   [1.6062 µs 1.6083 µs 1.6123 µs]
                        change: [-1.2658% -1.1519% -1.0143%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

vec10000_axpy_f64_slice time:   [1.4899 µs 1.4900 µs 1.4902 µs]
                        change: [+1.0994% +1.1302% +1.1582%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

vec10000_axpy_f64_static
                        time:   [1.4381 µs 1.4384 µs 1.4386 µs]
                        change: [-1.4760% -1.3183% -1.2239%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

vec10000_axpy_f32       time:   [758.37 ns 758.47 ns 758.59 ns]
                        change: [+0.7210% +1.1900% +1.4537%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

vec10000_axpy_beta_f32  time:   [859.70 ns 859.90 ns 860.17 ns]
                        change: [+7.1477% +7.2858% +7.4170%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

quaternion_add_q        time:   [1.7877 ns 1.7890 ns 1.7903 ns]
                        change: [+140.68% +140.91% +141.14%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

quaternion_sub_q        time:   [1.7894 ns 1.7907 ns 1.7920 ns]
                        change: [+140.86% +141.23% +141.61%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

quaternion_mul_q        time:   [3.2688 ns 3.2697 ns 3.2705 ns]
                        change: [+342.95% +343.27% +343.63%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

unit_quaternion_mul_v   time:   [11.541 ns 11.549 ns 11.563 ns]
                        change: [+2500.6% +2504.0% +2506.8%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 24 outliers among 100 measurements (24.00%)
  5 (5.00%) low severe
  6 (6.00%) low mild
  6 (6.00%) high mild
  7 (7.00%) high severe

quaternion_mul_s        time:   [1.5707 ns 1.5711 ns 1.5715 ns]
                        change: [+112.20% +112.40% +112.57%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

quaternion_div_s        time:   [1.5778 ns 1.5785 ns 1.5794 ns]
                        change: [+112.39% +112.52% +112.66%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low severe
  3 (3.00%) high mild
  4 (4.00%) high severe

quaternion_inv          time:   [1.9206 ns 1.9213 ns 1.9220 ns]
                        change: [+5.0362% +5.0895% +5.1421%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

unit_quaternion_inv     time:   [1.3271 ns 1.3285 ns 1.3295 ns]
                        change: [+9.1358% +9.2383% +9.3384%] (p = 0.00 < 0.05)
                        Performance has regressed.

bidiagonalize_100x100   time:   [265.82 µs 266.42 µs 267.32 µs]
                        change: [-0.4676% -0.2799% -0.0688%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

Benchmarking bidiagonalize_100x500: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.8s, enable flat sampling, or reduce sample count to 50.
bidiagonalize_100x500   time:   [1.9467 ms 1.9530 ms 1.9592 ms]
                        change: [-4.9336% -4.6776% -4.4570%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 32 outliers among 100 measurements (32.00%)
  8 (8.00%) low mild
  2 (2.00%) high mild
  22 (22.00%) high severe

bidiagonalize_4x4       time:   [248.64 ns 248.71 ns 248.81 ns]
                        change: [-5.2386% -5.1754% -5.1180%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high severe

Benchmarking bidiagonalize_500x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.3s, enable flat sampling, or reduce sample count to 50.
bidiagonalize_500x100   time:   [1.6480 ms 1.6490 ms 1.6504 ms]
                        change: [-1.6630% -1.4692% -1.2645%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

bidiagonalize_unpack_100x100
                        time:   [523.30 µs 523.38 µs 523.46 µs]
                        change: [-0.2366% -0.1909% -0.1462%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

bidiagonalize_unpack_100x500
                        time:   [3.0536 ms 3.0596 ms 3.0656 ms]
                        change: [+0.4142% +0.6144% +0.8242%] (p = 0.00 < 0.05)
                        Change within noise threshold.

bidiagonalize_unpack_500x100
                        time:   [2.6027 ms 2.6039 ms 2.6052 ms]
                        change: [-0.3104% -0.1818% -0.0919%] (p = 0.00 < 0.05)
                        Change within noise threshold.

cholesky_100x100        time:   [37.281 µs 37.289 µs 37.296 µs]
                        change: [+15.445% +15.524% +15.590%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

cholesky_500x500        time:   [4.8081 ms 4.8162 ms 4.8260 ms]
                        change: [+5.9663% +6.2001% +6.4592%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 30 outliers among 100 measurements (30.00%)
  19 (19.00%) low severe
  11 (11.00%) high severe

cholesky_decompose_unpack_100x100
                        time:   [37.755 µs 37.763 µs 37.773 µs]
                        change: [+14.066% +14.305% +14.477%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

cholesky_decompose_unpack_500x500
                        time:   [4.6743 ms 4.6891 ms 4.7052 ms]
                        change: [+2.1368% +2.4522% +2.8032%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 19 outliers among 100 measurements (19.00%)
  19 (19.00%) high severe

cholesky_solve_10x10    time:   [160.86 ns 160.97 ns 161.17 ns]
                        change: [+0.2713% +0.3412% +0.4236%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 20 outliers among 100 measurements (20.00%)
  3 (3.00%) high mild
  17 (17.00%) high severe

cholesky_solve_100x100  time:   [2.7392 µs 2.7399 µs 2.7407 µs]
                        change: [-0.2820% -0.2443% -0.2066%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

cholesky_solve_500x500  time:   [52.883 µs 52.896 µs 52.917 µs]
                        change: [+1.9926% +2.2238% +2.5966%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

cholesky_inverse_10x10  time:   [1.3102 µs 1.3110 µs 1.3119 µs]
                        change: [+1.5928% +1.6789% +1.7629%] (p = 0.00 < 0.05)
                        Performance has regressed.

cholesky_inverse_100x100
                        time:   [276.96 µs 276.98 µs 277.01 µs]
                        change: [+0.3069% +0.3685% +0.4182%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

cholesky_inverse_500x500
                        time:   [27.078 ms 27.084 ms 27.090 ms]
                        change: [+2.3068% +2.3369% +2.3710%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

full_piv_lu_decompose_10x10
                        time:   [560.68 ns 561.00 ns 561.33 ns]
                        change: [+0.6362% +0.7103% +0.7954%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high severe

full_piv_lu_decompose_100x100
                        time:   [207.09 µs 207.12 µs 207.15 µs]
                        change: [-0.3356% -0.2879% -0.2475%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

full_piv_lu_solve_10x10 time:   [117.33 ns 117.39 ns 117.46 ns]
                        change: [-1.0837% -1.0232% -0.9624%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe

full_piv_lu_solve_100x100
                        time:   [2.1694 µs 2.1707 µs 2.1729 µs]
                        change: [-1.7197% -1.6315% -1.5183%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

full_piv_lu_inverse_10x10
                        time:   [857.13 ns 857.32 ns 857.52 ns]
                        change: [-0.4396% -0.3489% -0.2379%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

full_piv_lu_inverse_100x100
                        time:   [211.92 µs 212.00 µs 212.10 µs]
                        change: [-2.0475% -1.9749% -1.9135%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

full_piv_lu_determinant_10x10
                        time:   [3.4777 ns 3.4794 ns 3.4814 ns]
                        change: [+17.827% +17.938% +18.036%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

full_piv_lu_determinant_100x100
                        time:   [38.435 ns 38.454 ns 38.475 ns]
                        change: [+3.5887% +3.6755% +3.7612%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

hessenberg_decompose_4x4
                        time:   [114.52 ns 114.54 ns 114.57 ns]
                        change: [-0.6226% -0.5236% -0.4055%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

hessenberg_decompose_100x100
                        time:   [289.44 µs 289.48 µs 289.54 µs]
                        change: [-0.1901% -0.1355% -0.0850%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

hessenberg_decompose_200x200
                        time:   [2.2102 ms 2.2147 ms 2.2212 ms]
                        change: [+0.8072% +1.0155% +1.2723%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
  12 (12.00%) low severe
  4 (4.00%) high severe

hessenberg_decompose_unpack_100x100
                        time:   [428.66 µs 428.77 µs 428.91 µs]
                        change: [-0.2769% -0.2390% -0.1885%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe

hessenberg_decompose_unpack_200x200
                        time:   [3.2263 ms 3.2288 ms 3.2314 ms]
                        change: [+0.8029% +1.0059% +1.1576%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

lu_decompose_10x10      time:   [361.75 ns 362.08 ns 362.41 ns]
                        change: [+13.439% +13.613% +13.784%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low severe
  1 (1.00%) high mild

lu_decompose_100x100    time:   [73.649 µs 73.662 µs 73.687 µs]
                        change: [+0.9113% +0.9610% +1.0216%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

lu_solve_10x10          time:   [110.70 ns 110.74 ns 110.78 ns]
                        change: [-0.6342% -0.5758% -0.5265%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

lu_solve_100x100        time:   [2.1038 µs 2.1047 µs 2.1059 µs]
                        change: [-2.7288% -2.6424% -2.5009%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe

lu_inverse_10x10        time:   [887.44 ns 887.64 ns 887.88 ns]
                        change: [-2.8481% -2.7682% -2.6966%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

lu_inverse_100x100      time:   [215.47 µs 215.67 µs 215.90 µs]
                        change: [-1.0637% -0.9343% -0.7643%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 17 outliers among 100 measurements (17.00%)
  8 (8.00%) high mild
  9 (9.00%) high severe

lu_determinant_10x10    time:   [2.5924 ns 2.5970 ns 2.6015 ns]
                        change: [+20.878% +21.192% +21.451%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

lu_determinant_100x100  time:   [35.934 ns 35.983 ns 36.032 ns]
                        change: [-1.7500% -1.6698% -1.5889%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  15 (15.00%) high severe

qr_decompose_100x100    time:   [143.15 µs 143.24 µs 143.40 µs]
                        change: [+0.7070% +0.8463% +1.0082%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking qr_decompose_100x500: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.1s, enable flat sampling, or reduce sample count to 60.
qr_decompose_100x500    time:   [1.0041 ms 1.0064 ms 1.0112 ms]
                        change: [-1.0163% -0.8784% -0.6928%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

qr_decompose_4x4        time:   [125.09 ns 125.10 ns 125.12 ns]
                        change: [-0.2898% -0.0935% +0.1416%] (p = 0.49 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  1 (1.00%) high mild
  15 (15.00%) high severe

qr_decompose_500x100    time:   [834.17 µs 835.03 µs 836.03 µs]
                        change: [-0.3712% -0.1635% +0.0648%] (p = 0.14 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

qr_decompose_unpack_100x100
                        time:   [283.60 µs 283.90 µs 284.16 µs]
                        change: [-0.3949% -0.2993% -0.1910%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

Benchmarking qr_decompose_unpack_100x500: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
qr_decompose_unpack_100x500
                        time:   [1.1475 ms 1.1491 ms 1.1521 ms]
                        change: [-0.8690% -0.7318% -0.5277%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

Benchmarking qr_decompose_unpack_500x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50.
qr_decompose_unpack_500x100
                        time:   [1.6793 ms 1.6797 ms 1.6801 ms]
                        change: [+2.4782% +2.5491% +2.6214%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

qr_solve_10x10          time:   [152.57 ns 152.63 ns 152.73 ns]
                        change: [-0.4288% -0.3814% -0.3276%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

qr_solve_100x100        time:   [3.3232 µs 3.3254 µs 3.3285 µs]
                        change: [-0.0158% +0.2076% +0.5519%] (p = 0.13 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  5 (5.00%) high severe

qr_inverse_10x10        time:   [805.76 ns 806.05 ns 806.44 ns]
                        change: [-0.9942% -0.7502% -0.6015%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

qr_inverse_100x100      time:   [330.09 µs 330.39 µs 330.67 µs]
                        change: [+0.4902% +0.5806% +0.6779%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

schur_decompose_4x4     time:   [926.95 ns 928.24 ns 929.26 ns]
                        change: [-13.166% -13.030% -12.884%] (p = 0.00 < 0.05)
                        Performance has improved.

schur_decompose_10x10   time:   [7.4409 µs 7.4453 µs 7.4492 µs]
                        change: [+1.5363% +1.6395% +1.7360%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) low severe
  1 (1.00%) high mild
  1 (1.00%) high severe

schur_decompose_100x100 time:   [2.6115 ms 2.6172 ms 2.6243 ms]
                        change: [+1.6088% +1.8440% +2.1414%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

schur_decompose_200x200 time:   [18.406 ms 18.418 ms 18.432 ms]
                        change: [+0.9610% +1.1237% +1.2824%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

eigenvalues_4x4         time:   [852.29 ns 855.47 ns 858.45 ns]
                        change: [-33.645% -33.514% -33.373%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  18 (18.00%) high severe

eigenvalues_10x10       time:   [5.9802 µs 5.9817 µs 5.9835 µs]
                        change: [+0.4433% +0.5280% +0.5939%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

Benchmarking eigenvalues_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
eigenvalues_100x100     time:   [1.5907 ms 1.5927 ms 1.5955 ms]
                        change: [+0.4506% +0.5711% +0.6951%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

eigenvalues_200x200     time:   [11.141 ms 11.142 ms 11.144 ms]
                        change: [-0.1305% -0.0959% -0.0696%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

solve_l_triangular_100x100
                        time:   [1.0009 µs 1.0023 µs 1.0044 µs]
                        change: [-3.0545% -2.9536% -2.8531%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

solve_l_triangular_1000x1000
                        time:   [101.96 µs 102.10 µs 102.26 µs]
                        change: [+0.2874% +0.4251% +0.5544%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 20 outliers among 100 measurements (20.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  16 (16.00%) high severe

tr_solve_l_triangular_100x100
                        time:   [1.7602 µs 1.7606 µs 1.7611 µs]
                        change: [+0.1457% +0.2378% +0.3238%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

tr_solve_l_triangular_1000x1000
                        time:   [95.588 µs 95.613 µs 95.638 µs]
                        change: [+0.6009% +0.8491% +1.0883%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

solve_u_triangular_100x100
                        time:   [1.1486 µs 1.1486 µs 1.1487 µs]
                        change: [-1.2041% -1.1651% -1.1380%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

solve_u_triangular_1000x1000
                        time:   [96.805 µs 96.827 µs 96.850 µs]
                        change: [-2.4840% -2.4423% -2.4081%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

tr_solve_u_triangular_100x100
                        time:   [1.1943 µs 1.1947 µs 1.1951 µs]
                        change: [-0.9912% -0.6228% -0.3160%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

tr_solve_u_triangular_1000x1000
                        time:   [86.848 µs 86.858 µs 86.868 µs]
                        change: [-1.4656% -1.4306% -1.4030%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

svd_decompose_2x2       time:   [24.714 ns 24.731 ns 24.757 ns]
                        change: [+16.359% +16.416% +16.476%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

svd_decompose_3x3       time:   [356.14 ns 356.26 ns 356.41 ns]
                        change: [+7.1080% +7.1682% +7.2242%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

svd_decompose_4x4       time:   [973.20 ns 973.36 ns 973.52 ns]
                        change: [-0.1503% -0.1038% -0.0509%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

svd_decompose_10x10     time:   [5.7955 µs 5.7969 µs 5.7982 µs]
                        change: [-1.2212% -1.1496% -1.0648%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

Benchmarking svd_decompose_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.8s, enable flat sampling, or reduce sample count to 50.
svd_decompose_100x100   time:   [1.5457 ms 1.5461 ms 1.5466 ms]
                        change: [-1.1800% -1.1312% -1.0869%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

svd_decompose_200x200   time:   [11.688 ms 11.698 ms 11.708 ms]
                        change: [-1.5525% -1.4404% -1.3260%] (p = 0.00 < 0.05)
                        Performance has improved.

rank_4x4                time:   [687.08 ns 687.31 ns 687.60 ns]
                        change: [-4.3382% -4.2619% -4.1730%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

rank_10x10              time:   [4.2263 µs 4.2314 µs 4.2359 µs]
                        change: [-0.0796% +0.0376% +0.1498%] (p = 0.53 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

rank_100x100            time:   [524.07 µs 525.08 µs 526.09 µs]
                        change: [+0.2055% +0.4232% +0.6185%] (p = 0.00 < 0.05)
                        Change within noise threshold.

rank_200x200            time:   [3.0034 ms 3.0049 ms 3.0066 ms]
                        change: [-0.0776% -0.0258% +0.0284%] (p = 0.40 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) low severe
  6 (6.00%) low mild
  2 (2.00%) high severe

singular_values_4x4     time:   [711.28 ns 711.46 ns 711.69 ns]
                        change: [-10.996% -10.943% -10.895%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe

singular_values_10x10   time:   [4.3082 µs 4.3088 µs 4.3098 µs]
                        change: [+0.2477% +0.2828% +0.3239%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  6 (6.00%) high severe

singular_values_100x100 time:   [520.96 µs 521.13 µs 521.29 µs]
                        change: [-1.3767% -1.2379% -1.0940%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

singular_values_200x200 time:   [3.0055 ms 3.0063 ms 3.0075 ms]
                        change: [-0.0668% -0.0002% +0.0545%] (p = 0.99 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

pseudo_inverse_4x4      time:   [767.27 ns 767.63 ns 768.12 ns]
                        change: [-20.375% -20.322% -20.267%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe

pseudo_inverse_10x10    time:   [6.1395 µs 6.1415 µs 6.1440 µs]
                        change: [+2.0662% +2.1284% +2.1866%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

Benchmarking pseudo_inverse_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.1s, enable flat sampling, or reduce sample count to 50.
pseudo_inverse_100x100  time:   [1.6018 ms 1.6035 ms 1.6050 ms]
                        change: [-0.0386% +0.0840% +0.2218%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 23 outliers among 100 measurements (23.00%)
  13 (13.00%) low severe
  3 (3.00%) high mild
  7 (7.00%) high severe

pseudo_inverse_200x200  time:   [11.989 ms 11.997 ms 12.006 ms]
                        change: [-0.7602% -0.5368% -0.3351%] (p = 0.00 < 0.05)
                        Change within noise threshold.

symmetric_eigen_decompose_4x4
                        time:   [453.98 ns 454.22 ns 454.53 ns]
                        change: [-10.475% -10.350% -10.207%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  11 (11.00%) high severe

symmetric_eigen_decompose_10x10
                        time:   [3.6767 µs 3.6782 µs 3.6800 µs]
                        change: [-1.7404% -1.6869% -1.6361%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe

symmetric_eigen_decompose_100x100
                        time:   [767.36 µs 768.36 µs 769.56 µs]
                        change: [-7.0024% -6.9023% -6.7865%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

symmetric_eigen_decompose_200x200
                        time:   [5.2143 ms 5.2218 ms 5.2350 ms]
                        change: [-9.1662% -8.8429% -8.5229%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  6 (6.00%) high severe

Significant regression means that the computation of a resulting value was optimized out of the benchmarking loop previously.

@im-0
Copy link
Contributor Author

im-0 commented Sep 24, 2025

Changed bench_binop!() and bench_binop_ref!() to pass self as reference into black_box() instead of by value. This removed unnecessary copies in some benchmarks, but overall results are mostly the same.

Copy link
Collaborator

@geo-ant geo-ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall, thanks for taking care of this. I just noted some questions in the code about the choice of what to black-box, but I'll be the first to admit that I don't have a lot of experience how to get the compiler not to optimize certain things out. I'd just feel better if you explained some of your rationale behind what exactly you black-boxed.


bench.bench_function("mat8_mul_mat8", move |bh| bh.iter(|| &a * &b));
bench.bench_function("mat8_mul_mat8", move |bh| {
bh.iter(|| black_box(&a) * black_box(&b))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honest question, would the results have been different if you had black-boxed the product rather than the individual components?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a high chance that the benchmark result will be different.

bh.iter(|| &a * &b)

In this^ example, criterion calls || &a * &b closure multiple times to measure how much time it takes. But compiler is smart enough to notice that a and b cannot change within bh.iter(...), so it rewrites everything like this:

let optimized = &a * &b;
bh.iter(|| optimized.clone())

What black_box() can do here is to make compiler think that black_box(x) produces completely random valid value of the same type as x. Of course, in a compiled binary it is a no-op and always produces just the value of x.

Wrapping the product with black_box() changes nothing here as compiler will still be able to see that arguments of mul() are not changing and thus it will be able to move mul() out of the loop:

let optimized = &a * &b;
bh.iter(|| black_box(optimized.clone()))

Also, the return value of the closure already passed to black_box() inside Criterion's bh.iter(...) to ensure that call to a closure is not removed during optimization. Here black_box(x) has slightly different meaning - some unspecified computation that produces side effects based on the value of x (and thus value of x is important and it cannot be removed from compiled code entirely).

And, as we are interesting in measuring the performance of mul(), we have two options:

  1. Generate proper random values for arguments of mul() on each iteration of bh.iter(...). This may be viable if mul() is slow enough to make random arg generation code appear insignificant in a total measured time. This option is not viable in general for nalgebra as a lot of benchmarks measure very fast operations that can be optimized down to just a few machine instructions (like Vector3 x Scalar multiplication etc.).
  2. Disguise unchanged arguments of mul() as a random values on each iteration of bh.iter(...). This is exactly what I did here using black_box().

Copy link
Collaborator

@geo-ant geo-ant Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great explanation, thank you. I think, in principle I'd be fine merging this, I'm just wondering if you had considered refactoring to iter_batched. This is what I do in my projects and the criterion docs say

If your routine requires some per-iteration setup that shouldn’t be timed, use iter_batched or iter_batched_ref

which should be a way to supply new random matrices in every iteration of the benchmark. I don' think this will matter in these cases here, but in general this could help confusing the processor pipeline enough to get a more realistic measurement. However, I've also found this unresolved issue bheisler/criterion.rs#475 about measurement overhead in iter_batched, which I wasn't aware of before.

I don't want to make your life more complicated and I'm very grateful you're tackling this problem. I was just thinking we should really nail the benchmarks, since you also have some other cool things in the pipeline, which do depend on accurate benchmarks. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I wanted to keep this change minimal so I haven't considered to use other Criterion functions.

I tried to use iter_batched for mat2_mul_v (same benchmark that I used for demonstration in the original issue) right now:

diff --git a/benches/common/macros.rs b/benches/common/macros.rs
index c3e12aaaef55..3521ef7ddc8c 100644
--- a/benches/common/macros.rs
+++ b/benches/common/macros.rs
@@ -4,15 +4,15 @@ macro_rules! bench_binop(
     ($name: ident, $t1: ty, $t2: ty, $binop: ident) => {
         fn $name(bh: &mut criterion::Criterion) {
             use rand::SeedableRng;
-            use std::hint::black_box;
 
             let mut rng = IsaacRng::seed_from_u64(0);
-            let a = rng.random::<$t1>();
-            let b = rng.random::<$t2>();
 
-            bh.bench_function(stringify!($name), move |bh| bh.iter(|| {
-                black_box(&a).$binop(black_box(b))
-            }));
+            bh.bench_function(stringify!($name), move |bh| bh.iter_batched(
+            || (rng.random::<$t1>(), rng.random::<$t2>()),
+            |args| {
+                args.0.$binop(args.1)
+            },
+            criterion::BatchSize::SmallInput));
         }
     }
 );

This somehow improved performance vs. my current changes, but still regresses vs. current main:

image

I tried to check the generated assembly, but for iter_batched() it is much longer and I am not that good at reading assembly:

click for details...

nalgebra_bench-ccbcfb07ef18979a`criterion::bencher::Bencher$LT$M$GT$::iter_batched::heec5543a72133bcc:
    0x55555562d840 <+0>:    pushq  %rbp
    0x55555562d841 <+1>:    pushq  %r15
    0x55555562d843 <+3>:    pushq  %r14
    0x55555562d845 <+5>:    pushq  %r13
    0x55555562d847 <+7>:    pushq  %r12
    0x55555562d849 <+9>:    pushq  %rbx
    0x55555562d84a <+10>:   subq   $0xa8, %rsp
    0x55555562d851 <+17>:   movb   $0x1, 0x30(%rdi)
    0x55555562d855 <+21>:   movq   0x28(%rdi), %r15
    0x55555562d859 <+25>:   leaq   0x9(%r15), %rcx
    0x55555562d85d <+29>:   movabsq $-0x3333333333333333, %rdx ; imm = 0xCCCCCCCCCCCCCCCD 
    0x55555562d867 <+39>:   movq   %rcx, %rax
    0x55555562d86a <+42>:   mulq   %rdx
    0x55555562d86d <+45>:   movq   %rdx, 0x78(%rsp)
    0x55555562d872 <+50>:   cmpq   $0x9, %rcx
    0x55555562d876 <+54>:   jbe    0x55555562dee6 ; <+1702>
    0x55555562d87c <+60>:   movq   %rsi, %r14
    0x55555562d87f <+63>:   movq   %rdi, %rbx
    0x55555562d882 <+66>:   movl   $0x1, %edi
    0x55555562d887 <+71>:   callq  0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
->  0x55555562d88c <+76>:   movq   %rax, 0x88(%rsp)
    0x55555562d894 <+84>:   movl   %edx, 0x74(%rsp)
    0x55555562d898 <+88>:   movq   $0x0, (%rbx)
    0x55555562d89f <+95>:   movl   $0x0, 0x8(%rbx)
    0x55555562d8a6 <+102>:  leaq   -0x1(%r15), %rax
    0x55555562d8aa <+106>:  cmpq   $0xa, %rax
    0x55555562d8ae <+110>:  movq   %rbx, 0x80(%rsp)
    0x55555562d8b6 <+118>:  jae    0x55555562da0f ; <+463>
    0x55555562d8bc <+124>:  xorl   %r12d, %r12d
    0x55555562d8bf <+127>:  xorl   %ebx, %ebx
    0x55555562d8c1 <+129>:  jmp    0x55555562d921 ; <+225>
    0x55555562d8c3 <+131>:  nopw   %cs:(%rax,%rax)
    0x55555562d8d0 <+144>:  addl   $0xc4653600, %r12d ; imm = 0xC4653600 
    0x55555562d8d7 <+151>:  incq   %rbx
    0x55555562d8da <+154>:  movss  0x40(%rsp), %xmm1
    0x55555562d8e0 <+160>:  addss  0x20(%rsp), %xmm1
    0x55555562d8e6 <+166>:  movss  0x48(%rsp), %xmm0
    0x55555562d8ec <+172>:  addss  0x18(%rsp), %xmm0
    0x55555562d8f2 <+178>:  movd   %xmm1, %eax
    0x55555562d8f6 <+182>:  movd   %xmm0, %ecx
    0x55555562d8fa <+186>:  shlq   $0x20, %rcx
    0x55555562d8fe <+190>:  orq    %rcx, %rax
    0x55555562d901 <+193>:  movq   %rbx, (%r13)
    0x55555562d905 <+197>:  movl   %r12d, 0x8(%r13)
    0x55555562d909 <+201>:  movq   %rax, (%rsp)
    0x55555562d90d <+205>:  movss  (%rsp), %xmm0
    0x55555562d912 <+210>:  movss  0x4(%rsp), %xmm0
    0x55555562d918 <+216>:  decq   %r15
    0x55555562d91b <+219>:  je     0x55555562de65 ; <+1573>
    0x55555562d921 <+225>:  movq   %rsp, %rdi
    0x55555562d924 <+228>:  movq   %r14, %rsi
    0x55555562d927 <+231>:  callq  0x5555556dc390 ; nalgebra_bench::core::matrix::mat2_mul_v::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h089cab30bae2dfb0
    0x55555562d92c <+236>:  movss  0x10(%rsp), %xmm0
    0x55555562d932 <+242>:  movss  0x14(%rsp), %xmm2
    0x55555562d938 <+248>:  movss  (%rsp), %xmm1
    0x55555562d93d <+253>:  mulss  %xmm0, %xmm1
    0x55555562d941 <+257>:  movss  %xmm1, 0x40(%rsp)
    0x55555562d947 <+263>:  mulss  0x4(%rsp), %xmm0
    0x55555562d94d <+269>:  movss  %xmm0, 0x48(%rsp)
    0x55555562d953 <+275>:  movss  0x8(%rsp), %xmm0
    0x55555562d959 <+281>:  mulss  %xmm2, %xmm0
    0x55555562d95d <+285>:  movss  %xmm0, 0x20(%rsp)
    0x55555562d963 <+291>:  mulss  0xc(%rsp), %xmm2
    0x55555562d969 <+297>:  movss  %xmm2, 0x18(%rsp)
    0x55555562d96f <+303>:  movl   $0x1, %edi
    0x55555562d974 <+308>:  callq  0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
    0x55555562d979 <+313>:  movq   %rax, %r13
    0x55555562d97c <+316>:  movl   %edx, %ebp
    0x55555562d97e <+318>:  movl   $0x1, %edi
    0x55555562d983 <+323>:  callq  0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
    0x55555562d988 <+328>:  movq   %rax, 0x50(%rsp)
    0x55555562d98d <+333>:  movl   %edx, 0x58(%rsp)
    0x55555562d991 <+337>:  movq   %r13, 0x28(%rsp)
    0x55555562d996 <+342>:  movl   %ebp, 0x30(%rsp)
    0x55555562d99a <+346>:  movq   %rsp, %rdi
    0x55555562d99d <+349>:  leaq   0x50(%rsp), %rsi
    0x55555562d9a2 <+354>:  leaq   0x28(%rsp), %rdx
    0x55555562d9a7 <+359>:  callq  0x55555593ac90 ; std::sys::pal::unix::time::Timespec::sub_timespec::hb206577083debcb5
    0x55555562d9ac <+364>:  movzbl (%rsp), %eax
    0x55555562d9b0 <+368>:  testb  %al, %al
    0x55555562d9b2 <+370>:  jne    0x55555562d9c0 ; <+384>
    0x55555562d9b4 <+372>:  movq   0x8(%rsp), %rcx
    0x55555562d9b9 <+377>:  jmp    0x55555562d9c2 ; <+386>
    0x55555562d9bb <+379>:  nopl   (%rax,%rax)
    0x55555562d9c0 <+384>:  xorl   %ecx, %ecx
    0x55555562d9c2 <+386>:  addq   %rcx, %rbx
    0x55555562d9c5 <+389>:  movq   0x80(%rsp), %r13
    0x55555562d9cd <+397>:  jb     0x55555562d9f7 ; <+439>
    0x55555562d9cf <+399>:  testb  $0x1, %al
    0x55555562d9d1 <+401>:  movl   0x10(%rsp), %eax
    0x55555562d9d5 <+405>:  movl   $0x0, %ecx
    0x55555562d9da <+410>:  cmovnel %ecx, %eax
    0x55555562d9dd <+413>:  addl   %eax, %r12d
    0x55555562d9e0 <+416>:  cmpl   $0x3b9aca00, %r12d ; imm = 0x3B9ACA00 
    0x55555562d9e7 <+423>:  jb     0x55555562d8da ; <+154>
    0x55555562d9ed <+429>:  cmpq   $-0x1, %rbx
    0x55555562d9f1 <+433>:  jne    0x55555562d8d0 ; <+144>
    0x55555562d9f7 <+439>:  leaq   0x3867bc(%rip), %rdi
    0x55555562d9fe <+446>:  leaq   0x4456db(%rip), %rdx ; __dso_handle + 29352
    0x55555562da05 <+453>:  movl   $0x1e, %esi
    0x55555562da0a <+458>:  callq  0x5555555b1fd0 ; core::option::expect_failed::h50b71e74d7945a60
    0x55555562da0f <+463>:  shrq   $0x3, 0x78(%rsp)
    0x55555562da15 <+469>:  movabsq $0xffffffffffffffe, %rax ; imm = 0xFFFFFFFFFFFFFFE 
    0x55555562da1f <+479>:  movq   $0x0, 0x60(%rsp)
    0x55555562da28 <+488>:  incq   %rax
    0x55555562da2b <+491>:  movq   %rax, 0x90(%rsp)
    0x55555562da33 <+499>:  xorl   %esi, %esi
    0x55555562da35 <+501>:  jmp    0x55555562da55 ; <+533>
    0x55555562da37 <+503>:  nopw   (%rax,%rax)
    0x55555562da40 <+512>:  movq   0x40(%rsp), %rsi
    0x55555562da45 <+517>:  addq   %rbx, %rsi
    0x55555562da48 <+520>:  movq   0x28(%r13), %r15
    0x55555562da4c <+524>:  cmpq   %r15, %rsi
    0x55555562da4f <+527>:  jae    0x55555562de65 ; <+1573>
    0x55555562da55 <+533>:  movq   %r15, %r13
    0x55555562da58 <+536>:  subq   %rsi, %r13
    0x55555562da5b <+539>:  movq   0x78(%rsp), %rax
    0x55555562da60 <+544>:  cmpq   %rax, %r13
    0x55555562da63 <+547>:  cmovaeq %rax, %r13
    0x55555562da67 <+551>:  movq   %r13, %rax
    0x55555562da6a <+554>:  movl   $0x18, %ecx
    0x55555562da6f <+559>:  mulq   %rcx
    0x55555562da72 <+562>:  jo     0x55555562defe ; <+1726>
    0x55555562da78 <+568>:  movabsq $0x7ffffffffffffffd, %rcx ; imm = 0x7FFFFFFFFFFFFFFD 
    0x55555562da82 <+578>:  cmpq   %rcx, %rax
    0x55555562da85 <+581>:  jae    0x55555562defe ; <+1726>
    0x55555562da8b <+587>:  movq   %rsi, 0x40(%rsp)
    0x55555562da90 <+592>:  testq  %rax, %rax
    0x55555562da93 <+595>:  movq   %r15, 0x68(%rsp)
    0x55555562da98 <+600>:  je     0x55555562dac0 ; <+640>
    0x55555562da9a <+602>:  movq   %rax, %r12
    0x55555562da9d <+605>:  movq   %rax, %rdi
    0x55555562daa0 <+608>:  callq  *0x46d20a(%rip) ; _GLOBAL_OFFSET_TABLE_ + 320
    0x55555562daa6 <+614>:  movq   %rax, %rbx
    0x55555562daa9 <+617>:  movq   %r13, %rbp
    0x55555562daac <+620>:  testq  %rax, %rax
    0x55555562daaf <+623>:  jne    0x55555562dac7 ; <+647>
    0x55555562dab1 <+625>:  jmp    0x55555562df2a ; <+1770>
    0x55555562dab6 <+630>:  nopw   %cs:(%rax,%rax)
    0x55555562dac0 <+640>:  movl   $0x4, %ebx
    0x55555562dac5 <+645>:  xorl   %ebp, %ebp
    0x55555562dac7 <+647>:  movq   %rbx, %r12
    0x55555562daca <+650>:  movq   %r13, 0x48(%rsp)
    0x55555562dacf <+655>:  movq   %rsp, %r15
    0x55555562dad2 <+658>:  nopw   %cs:(%rax,%rax)
    0x55555562dae0 <+672>:  movq   %r15, %rdi
    0x55555562dae3 <+675>:  movq   %r14, %rsi
    0x55555562dae6 <+678>:  callq  0x5555556dc390 ; nalgebra_bench::core::matrix::mat2_mul_v::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h089cab30bae2dfb0
    0x55555562daeb <+683>:  movq   0x10(%rsp), %rax
    0x55555562daf0 <+688>:  movq   %rax, 0x10(%r12)
    0x55555562daf5 <+693>:  movups (%rsp), %xmm0
    0x55555562daf9 <+697>:  movups %xmm0, (%r12)
    0x55555562dafe <+702>:  addq   $0x18, %r12
    0x55555562db02 <+706>:  decq   %r13
    0x55555562db05 <+709>:  jne    0x55555562dae0 ; <+672>
    0x55555562db07 <+711>:  movq   %rbp, (%rsp)
    0x55555562db0b <+715>:  movq   %rbx, 0x8(%rsp)
    0x55555562db10 <+720>:  movq   0x48(%rsp), %r15
    0x55555562db15 <+725>:  movq   %r15, 0x10(%rsp)
    0x55555562db1a <+730>:  movq   (%rsp), %rax
    0x55555562db1e <+734>:  movq   %rax, 0x20(%rsp)
    0x55555562db23 <+739>:  movq   0x8(%rsp), %rax
    0x55555562db28 <+744>:  movq   %rax, 0x18(%rsp)
    0x55555562db2d <+749>:  movq   0x10(%rsp), %rbx
    0x55555562db32 <+754>:  leaq   (,%r15,8), %r12
    0x55555562db3a <+762>:  cmpq   0x90(%rsp), %r15
    0x55555562db42 <+770>:  ja     0x55555562df14 ; <+1748>
    0x55555562db48 <+776>:  movq   0x68(%rsp), %rax
    0x55555562db4d <+781>:  cmpq   0x40(%rsp), %rax
    0x55555562db52 <+786>:  jne    0x55555562db60 ; <+800>
    0x55555562db54 <+788>:  movl   $0x4, %ebp
    0x55555562db59 <+793>:  xorl   %r15d, %r15d
    0x55555562db5c <+796>:  jmp    0x55555562dbb0 ; <+880>
    0x55555562db5e <+798>:  nop    
    0x55555562db60 <+800>:  testq  %r15, %r15
    0x55555562db63 <+803>:  je     0x55555562db7b ; <+827>
    0x55555562db65 <+805>:  movq   %r12, %rdi
    0x55555562db68 <+808>:  callq  *0x46d142(%rip) ; _GLOBAL_OFFSET_TABLE_ + 320
    0x55555562db6e <+814>:  movq   %rax, %rbp
    0x55555562db71 <+817>:  testq  %rbp, %rbp
    0x55555562db74 <+820>:  jne    0x55555562dbb0 ; <+880>
    0x55555562db76 <+822>:  jmp    0x55555562df0a ; <+1738>
    0x55555562db7b <+827>:  movq   $0x0, (%rsp)
    0x55555562db83 <+835>:  movl   $0x8, %esi
    0x55555562db88 <+840>:  movq   %rsp, %rdi
    0x55555562db8b <+843>:  movq   %r12, %rdx
    0x55555562db8e <+846>:  callq  *0x46d064(%rip) ; _GLOBAL_OFFSET_TABLE_ + 136
    0x55555562db94 <+852>:  testl  %eax, %eax
    0x55555562db96 <+854>:  jne    0x55555562df0a ; <+1738>
    0x55555562db9c <+860>:  movq   (%rsp), %rbp
    0x55555562dba0 <+864>:  testq  %rbp, %rbp
    0x55555562dba3 <+867>:  je     0x55555562df0a ; <+1738>
    0x55555562dba9 <+873>:  nopl   (%rax)
    0x55555562dbb0 <+880>:  movq   %r15, 0x28(%rsp)
    0x55555562dbb5 <+885>:  movq   %rbp, 0x30(%rsp)
    0x55555562dbba <+890>:  movq   $0x0, 0x38(%rsp)
    0x55555562dbc3 <+899>:  movl   $0x1, %edi
    0x55555562dbc8 <+904>:  callq  0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
    0x55555562dbcd <+909>:  movl   %edx, 0x68(%rsp)
    0x55555562dbd1 <+913>:  movq   %rax, %r12
    0x55555562dbd4 <+916>:  cmpq   %r15, %rbx
    0x55555562dbd7 <+919>:  ja     0x55555562de37 ; <+1527>
    0x55555562dbdd <+925>:  movl   $0x0, %esi
    0x55555562dbe2 <+930>:  testq  %rbx, %rbx
    0x55555562dbe5 <+933>:  movq   0x18(%rsp), %r15
    0x55555562dbea <+938>:  je     0x55555562dd4d ; <+1293>
    0x55555562dbf0 <+944>:  leaq   (%rbx,%rbx,2), %rdi
    0x55555562dbf4 <+948>:  leaq   -0x18(,%rdi,8), %rcx
    0x55555562dbfc <+956>:  movq   %rcx, %rax
    0x55555562dbff <+959>:  movabsq $-0x5555555555555555, %rdx ; imm = 0xAAAAAAAAAAAAAAAB 
    0x55555562dc09 <+969>:  mulq   %rdx
    0x55555562dc0c <+972>:  cmpq   $0x5f, %rcx
    0x55555562dc10 <+976>:  jbe    0x55555562dc46 ; <+1030>
    0x55555562dc12 <+978>:  shrq   $0x4, %rdx
    0x55555562dc16 <+982>:  leaq   (,%rsi,8), %rcx
    0x55555562dc1e <+990>:  addq   %rbp, %rcx
    0x55555562dc21 <+993>:  leaq   (%rdx,%rdx,2), %rax
    0x55555562dc25 <+997>:  leaq   (%r15,%rax,8), %rax
    0x55555562dc29 <+1001>: addq   $0x18, %rax
    0x55555562dc2d <+1005>: cmpq   %rax, %rcx
    0x55555562dc30 <+1008>: jae    0x55555562dc4e ; <+1038>
    0x55555562dc32 <+1010>: leaq   (%rsi,%rdx), %rax
    0x55555562dc36 <+1014>: leaq   0x8(,%rax,8), %rax
    0x55555562dc3e <+1022>: addq   %rbp, %rax
    0x55555562dc41 <+1025>: cmpq   %rax, %r15
    0x55555562dc44 <+1028>: jae    0x55555562dc4e ; <+1038>
    0x55555562dc46 <+1030>: movq   %r15, %rax
    0x55555562dc49 <+1033>: jmp    0x55555562dcf0 ; <+1200>
    0x55555562dc4e <+1038>: movabsq $0xffffffffffffffe, %rax ; imm = 0xFFFFFFFFFFFFFFE 
    0x55555562dc58 <+1048>: andq   %rax, %rdx
    0x55555562dc5b <+1051>: leaq   (,%rdx,8), %rax
    0x55555562dc63 <+1059>: leaq   (%rax,%rax,2), %rax
    0x55555562dc67 <+1063>: movq   %r15, %r8
    0x55555562dc6a <+1066>: xorl   %r9d, %r9d
    0x55555562dc6d <+1069>: nopl   (%rax)
    0x55555562dc70 <+1072>: movupd (%r8), %xmm1
    0x55555562dc75 <+1077>: movupd 0x10(%r8), %xmm2
    0x55555562dc7b <+1083>: movupd 0x20(%r8), %xmm3
    0x55555562dc81 <+1089>: movapd %xmm2, %xmm4
    0x55555562dc85 <+1093>: movapd %xmm1, %xmm0
    0x55555562dc89 <+1097>: movsd  %xmm3, %xmm0 ; xmm0 = xmm3[0],xmm0[1] 
    0x55555562dc8d <+1101>: movapd %xmm3, %xmm5
    0x55555562dc91 <+1105>: movsd  %xmm2, %xmm3 ; xmm3 = xmm2[0],xmm3[1] 
    0x55555562dc95 <+1109>: shufps $0x2, %xmm1, %xmm2 ; xmm2 = xmm2[2,0],xmm1[0,0] 
    0x55555562dc99 <+1113>: shufps $0xe2, %xmm1, %xmm2 ; xmm2 = xmm2[2,0],xmm1[2,3] 
    0x55555562dc9d <+1117>: shufps $0x13, %xmm1, %xmm4 ; xmm4 = xmm4[3,0],xmm1[1,0] 
    0x55555562dca1 <+1121>: shufps $0xe2, %xmm1, %xmm4 ; xmm4 = xmm4[2,0],xmm1[2,3] 
    0x55555562dca5 <+1125>: shufps $0xe2, %xmm1, %xmm0 ; xmm0 = xmm0[2,0],xmm1[2,3] 
    0x55555562dca9 <+1129>: shufps $0x31, %xmm1, %xmm5 ; xmm5 = xmm5[1,0],xmm1[3,0] 
    0x55555562dcad <+1133>: shufps $0xe2, %xmm1, %xmm5 ; xmm5 = xmm5[2,0],xmm1[2,3] 
    0x55555562dcb1 <+1137>: movapd %xmm3, %xmm1
    0x55555562dcb5 <+1141>: shufps $0xe8, %xmm3, %xmm1 ; xmm1 = xmm1[0,2],xmm3[2,3] 
    0x55555562dcb9 <+1145>: psrlq  $0x20, %xmm3
    0x55555562dcbe <+1150>: pshufd $0xe8, %xmm3, %xmm3 ; xmm3 = xmm3[0,2,2,3] 
    0x55555562dcc3 <+1155>: mulps  %xmm1, %xmm2
    0x55555562dcc6 <+1158>: mulps  %xmm4, %xmm1
    0x55555562dcc9 <+1161>: mulps  %xmm3, %xmm0
    0x55555562dccc <+1164>: addps  %xmm2, %xmm0
    0x55555562dccf <+1167>: mulps  %xmm3, %xmm5
    0x55555562dcd2 <+1170>: addps  %xmm1, %xmm5
    0x55555562dcd5 <+1173>: unpcklps %xmm5, %xmm0 ; xmm0 = xmm0[0],xmm5[0],xmm0[1],xmm5[1] 
    0x55555562dcd8 <+1176>: movups %xmm0, (%rcx,%r9,8)
    0x55555562dcdd <+1181>: addq   $0x2, %r9
    0x55555562dce1 <+1185>: addq   $0x30, %r8
    0x55555562dce5 <+1189>: cmpq   %r9, %rdx
    0x55555562dce8 <+1192>: jne    0x55555562dc70 ; <+1072>
    0x55555562dcea <+1194>: addq   %rdx, %rsi
    0x55555562dced <+1197>: addq   %r15, %rax
    0x55555562dcf0 <+1200>: leaq   (%r15,%rdi,8), %rcx
    0x55555562dcf4 <+1204>: nopw   %cs:(%rax,%rax)
    0x55555562dd00 <+1216>: movss  0x10(%rax), %xmm0
    0x55555562dd05 <+1221>: movss  0x14(%rax), %xmm1
    0x55555562dd0a <+1226>: movss  (%rax), %xmm2
    0x55555562dd0e <+1230>: mulss  %xmm0, %xmm2
    0x55555562dd12 <+1234>: mulss  0x4(%rax), %xmm0
    0x55555562dd17 <+1239>: movss  0x8(%rax), %xmm3
    0x55555562dd1c <+1244>: mulss  %xmm1, %xmm3
    0x55555562dd20 <+1248>: addss  %xmm2, %xmm3
    0x55555562dd24 <+1252>: mulss  0xc(%rax), %xmm1
    0x55555562dd29 <+1257>: addss  %xmm0, %xmm1
    0x55555562dd2d <+1261>: movd   %xmm3, %edx
    0x55555562dd31 <+1265>: movd   %xmm1, %edi
    0x55555562dd35 <+1269>: shlq   $0x20, %rdi
    0x55555562dd39 <+1273>: orq    %rdi, %rdx
    0x55555562dd3c <+1276>: movq   %rdx, (%rbp,%rsi,8)
    0x55555562dd41 <+1281>: incq   %rsi
    0x55555562dd44 <+1284>: addq   $0x18, %rax
    0x55555562dd48 <+1288>: cmpq   %rcx, %rax
    0x55555562dd4b <+1291>: jne    0x55555562dd00 ; <+1216>
    0x55555562dd4d <+1293>: movq   %rsi, 0x38(%rsp)
    0x55555562dd52 <+1298>: cmpq   $0x0, 0x20(%rsp)
    0x55555562dd58 <+1304>: je     0x55555562dd63 ; <+1315>
    0x55555562dd5a <+1306>: movq   %r15, %rdi
    0x55555562dd5d <+1309>: callq  *0x46cfdd(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464
    0x55555562dd63 <+1315>: movl   $0x1, %edi
    0x55555562dd68 <+1320>: callq  0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
    0x55555562dd6d <+1325>: movq   0x80(%rsp), %r13
    0x55555562dd75 <+1333>: movq   %rsp, %rdi
    0x55555562dd78 <+1336>: movq   %rax, 0x98(%rsp)
    0x55555562dd80 <+1344>: movl   %edx, 0xa0(%rsp)
    0x55555562dd87 <+1351>: movq   %r12, 0x50(%rsp)
    0x55555562dd8c <+1356>: movl   0x68(%rsp), %eax
    0x55555562dd90 <+1360>: movl   %eax, 0x58(%rsp)
    0x55555562dd94 <+1364>: leaq   0x98(%rsp), %rsi
    0x55555562dd9c <+1372>: leaq   0x50(%rsp), %rdx
    0x55555562dda1 <+1377>: callq  0x55555593ac90 ; std::sys::pal::unix::time::Timespec::sub_timespec::hb206577083debcb5
    0x55555562dda6 <+1382>: movzbl (%rsp), %ecx
    0x55555562ddaa <+1386>: testb  %cl, %cl
    0x55555562ddac <+1388>: movq   0x48(%rsp), %rbx
    0x55555562ddb1 <+1393>: jne    0x55555562ddc0 ; <+1408>
    0x55555562ddb3 <+1395>: movq   0x8(%rsp), %rax
    0x55555562ddb8 <+1400>: jmp    0x55555562ddc2 ; <+1410>
    0x55555562ddba <+1402>: nopw   (%rax,%rax)
    0x55555562ddc0 <+1408>: xorl   %eax, %eax
    0x55555562ddc2 <+1410>: addq   (%r13), %rax
    0x55555562ddc6 <+1414>: jb     0x55555562decc ; <+1676>
    0x55555562ddcc <+1420>: testb  $0x1, %cl
    0x55555562ddcf <+1423>: movl   0x10(%rsp), %ecx
    0x55555562ddd3 <+1427>: movl   $0x0, %edx
    0x55555562ddd8 <+1432>: cmovnel %edx, %ecx
    0x55555562dddb <+1435>: addl   0x8(%r13), %ecx
    0x55555562dddf <+1439>: cmpl   $0x3b9aca00, %ecx ; imm = 0x3B9ACA00 
    0x55555562dde5 <+1445>: jb     0x55555562ddfa ; <+1466>
    0x55555562dde7 <+1447>: cmpq   $-0x1, %rax
    0x55555562ddeb <+1451>: je     0x55555562decc ; <+1676>
    0x55555562ddf1 <+1457>: addl   $0xc4653600, %ecx ; imm = 0xC4653600 
    0x55555562ddf7 <+1463>: incq   %rax
    0x55555562ddfa <+1466>: movq   %rax, (%r13)
    0x55555562ddfe <+1470>: movl   %ecx, 0x8(%r13)
    0x55555562de02 <+1474>: movq   0x38(%rsp), %rax
    0x55555562de07 <+1479>: movq   %rax, 0x10(%rsp)
    0x55555562de0c <+1484>: movups 0x28(%rsp), %xmm0
    0x55555562de11 <+1489>: movaps %xmm0, (%rsp)
    0x55555562de15 <+1493>: movq   0x10(%rsp), %rax
    0x55555562de1a <+1498>: movq   (%rsp), %rax
    0x55555562de1e <+1502>: movq   0x8(%rsp), %rdi
    0x55555562de23 <+1507>: testq  %rax, %rax
    0x55555562de26 <+1510>: je     0x55555562da40 ; <+512>
    0x55555562de2c <+1516>: callq  *0x46cf0e(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464
    0x55555562de32 <+1522>: jmp    0x55555562da40 ; <+512>
    0x55555562de37 <+1527>: movl   $0x4, %ecx
    0x55555562de3c <+1532>: movl   $0x8, %r8d
    0x55555562de42 <+1538>: leaq   0x28(%rsp), %rdi
    0x55555562de47 <+1543>: xorl   %esi, %esi
    0x55555562de49 <+1545>: movq   %rbx, %rdx
    0x55555562de4c <+1548>: movq   0x18(%rsp), %r15
    0x55555562de51 <+1553>: callq  0x555555555550 ; alloc::raw_vec::RawVecInner$LT$A$GT$::reserve::do_reserve_and_handle::h6eaae75860de7206
    0x55555562de56 <+1558>: movq   0x30(%rsp), %rbp
    0x55555562de5b <+1563>: movq   0x38(%rsp), %rsi
    0x55555562de60 <+1568>: jmp    0x55555562dbf0 ; <+944>
    0x55555562de65 <+1573>: movl   $0x1, %edi
    0x55555562de6a <+1578>: callq  0x55555593abc0 ; std::sys::pal::unix::time::Timespec::now::h5ef5d0c6e88433fb
    0x55555562de6f <+1583>: movq   %rax, 0x50(%rsp)
    0x55555562de74 <+1588>: movl   %edx, 0x58(%rsp)
    0x55555562de78 <+1592>: movq   0x88(%rsp), %rax
    0x55555562de80 <+1600>: movq   %rax, 0x28(%rsp)
    0x55555562de85 <+1605>: movl   0x74(%rsp), %eax
    0x55555562de89 <+1609>: movl   %eax, 0x30(%rsp)
    0x55555562de8d <+1613>: movq   %rsp, %rdi
    0x55555562de90 <+1616>: leaq   0x50(%rsp), %rsi
    0x55555562de95 <+1621>: leaq   0x28(%rsp), %rdx
    0x55555562de9a <+1626>: callq  0x55555593ac90 ; std::sys::pal::unix::time::Timespec::sub_timespec::hb206577083debcb5
    0x55555562de9f <+1631>: xorl   %eax, %eax
    0x55555562dea1 <+1633>: cmpb   $0x0, (%rsp)
    0x55555562dea5 <+1637>: movl   0x10(%rsp), %ecx
    0x55555562dea9 <+1641>: cmovnel %eax, %ecx
    0x55555562deac <+1644>: cmoveq 0x8(%rsp), %rax
    0x55555562deb2 <+1650>: movq   %rax, 0x10(%r13)
    0x55555562deb6 <+1654>: movl   %ecx, 0x18(%r13)
    0x55555562deba <+1658>: addq   $0xa8, %rsp
    0x55555562dec1 <+1665>: popq   %rbx
    0x55555562dec2 <+1666>: popq   %r12
    0x55555562dec4 <+1668>: popq   %r13
    0x55555562dec6 <+1670>: popq   %r14
    0x55555562dec8 <+1672>: popq   %r15
    0x55555562deca <+1674>: popq   %rbp
    0x55555562decb <+1675>: retq   
    0x55555562decc <+1676>: leaq   0x3862e7(%rip), %rdi
    0x55555562ded3 <+1683>: leaq   0x445206(%rip), %rdx ; __dso_handle + 29352
    0x55555562deda <+1690>: movl   $0x1e, %esi
    0x55555562dedf <+1695>: callq  0x5555555b1fd0 ; core::option::expect_failed::h50b71e74d7945a60
    0x55555562dee4 <+1700>: jmp    0x55555562df28 ; <+1768>
    0x55555562dee6 <+1702>: leaq   0x379c5e(%rip), %rdi
    0x55555562deed <+1709>: leaq   0x43f824(%rip), %rdx ; __dso_handle + 6368
    0x55555562def4 <+1716>: movl   $0x1c, %esi
    0x55555562def9 <+1721>: callq  0x5555555c491e ; std::panicking::begin_panic::h4f2cc586c820a72c
    0x55555562defe <+1726>: leaq   0x46c7fb(%rip), %rdi ; __dso_handle + 190664
    0x55555562df05 <+1733>: callq  0x5555555aedb0 ; alloc::raw_vec::capacity_overflow::h46cadc9fcf0d8ebe
    0x55555562df0a <+1738>: movl   $0x4, %eax
    0x55555562df0f <+1743>: movq   %rax, 0x60(%rsp)
    0x55555562df14 <+1748>: leaq   0x43f815(%rip), %rdx ; __dso_handle + 6392
    0x55555562df1b <+1755>: movq   0x60(%rsp), %rdi
    0x55555562df20 <+1760>: movq   %r12, %rsi
    0x55555562df23 <+1763>: callq  0x5555555aed83 ; alloc::raw_vec::handle_error::hc389833aee8d6f48
    0x55555562df28 <+1768>: ud2    
    0x55555562df2a <+1770>: movl   $0x4, %edi
    0x55555562df2f <+1775>: movq   %r12, %rsi
    0x55555562df32 <+1778>: callq  0x5555555aed99 ; alloc::alloc::handle_alloc_error::h9164725ce4591dac
    0x55555562df37 <+1783>: movq   %rax, %rbx
    0x55555562df3a <+1786>: cmpq   $0x0, 0x20(%rsp)
    0x55555562df40 <+1792>: jne    0x55555562df4d ; <+1805>
    0x55555562df42 <+1794>: movq   $0x0, 0x20(%rsp)
    0x55555562df4b <+1803>: jmp    0x55555562df71 ; <+1841>
    0x55555562df4d <+1805>: movq   0x18(%rsp), %rdi
    0x55555562df52 <+1810>: callq  *0x46cde8(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464
    0x55555562df58 <+1816>: jmp    0x55555562df71 ; <+1841>
    0x55555562df5a <+1818>: movq   %rax, %rbx
    0x55555562df5d <+1821>: movb   $0x1, %bpl
    0x55555562df60 <+1824>: jmp    0x55555562df78 ; <+1848>
    0x55555562df62 <+1826>: movq   %rax, %rbx
    0x55555562df65 <+1829>: movq   0x18(%rsp), %rdi
    0x55555562df6a <+1834>: jmp    0x55555562df92 ; <+1874>
    0x55555562df6c <+1836>: jmp    0x55555562df6e ; <+1838>
    0x55555562df6e <+1838>: movq   %rax, %rbx
    0x55555562df71 <+1841>: movq   0x28(%rsp), %r15
    0x55555562df76 <+1846>: xorl   %ebp, %ebp
    0x55555562df78 <+1848>: testq  %r15, %r15
    0x55555562df7b <+1851>: je     0x55555562df88 ; <+1864>
    0x55555562df7d <+1853>: movq   0x30(%rsp), %rdi
    0x55555562df82 <+1858>: callq  *0x46cdb8(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464
    0x55555562df88 <+1864>: testb  %bpl, %bpl
    0x55555562df8b <+1867>: movq   0x18(%rsp), %rdi
    0x55555562df90 <+1872>: je     0x55555562df9a ; <+1882>
    0x55555562df92 <+1874>: cmpq   $0x0, 0x20(%rsp)
    0x55555562df98 <+1880>: jne    0x55555562dfa2 ; <+1890>
    0x55555562df9a <+1882>: movq   %rbx, %rdi
    0x55555562df9d <+1885>: callq  0x5555555543b0 ; symbol stub for: _Unwind_Resume
    0x55555562dfa2 <+1890>: callq  *0x46cd98(%rip) ; _GLOBAL_OFFSET_TABLE_ + 464
    0x55555562dfa8 <+1896>: movq   %rbx, %rdi
    0x55555562dfab <+1899>: callq  0x5555555543b0 ; symbol stub for: _Unwind_Resume

I am not sure, but I think that compiler was able to autovectorize this to process two(?) mul(a, b) calls per iteration, see code starting at 0x55555562dc70. I do not have time for this right now, but will be able to return to this later today or at the beginning of the week.

In general, I think that it should be easy to modify existing macros to use iter_batched() and iter_batched_ref(). I really do not want to do this manually for the rest of the code, but may try to task LLM with this 😁.

Ah, and also this poses another question: do we want to always generate random values each time for both arguments of binary operations to simulate a worst case scenario? Or we also need to add another macro that generates a lot of random values for a second argument but uses a reference to single first argument (self)? In some cases in practice you need to multiply like a ton of vectors by a single matrix. And there might be a difference in performance due to a cache misses, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also checked the implementation of iter_batched*() and it seems that it does the right thing and the return value of a setup closure is wrapped in a black_box().

Regarding the Criterion issue you mentioned: I am not sure, but I suspect that the problem is that they are measuring the time it takes to deallocate a vector on drop(). Performance of free() may depend on an allocated size because allocators sometimes use different algorithms for different allocation sizes.

And a last thought for now: can we bump criterion to a latest version as a part of this PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a last thought for now: can we bump criterion to a latest version as a part of this PR?

I think bumping to the last criterion version is a very good idea, provided everything else still works.

Ah, and also this poses another question: do we want to always generate random values each time for both arguments of binary operations to simulate a worst case scenario? Or we also need to add another macro that generates a lot of random values for a second argument but uses a reference to single first argument (self)? In some cases in practice you need to multiply like a ton of vectors by a single matrix. And there might be a difference in performance due to a cache misses, for example.

Very good question and I don't think I have a great answer. Maybe your provided implementation is better after all? It corrects the original code but keeps the same spirit, i.e. if we have sufficiently small pieces of data, we'll take advantage of caching... I don't know... microbenchmarks sure are great aren't they 😆

In general, I think that it should be easy to modify existing macros to use iter_batched() and iter_batched_ref(). I really do not want to do this manually for the rest of the code, but may try to task LLM with this 😁.

Given the discussion above, I don't know if you would want to try the refactor at all. If you end up attempting it and it is too tedious for you to refactor (or you don't have access to one of our future AI overloads), let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redid almost everything in this PR. Here is the summary of all changes relative to current main:

  • Set codegen-units = 1 for benchmarks. I found that codegen-units with default value leads to inconsistent results across recompilations (clean vs. incremental). Also, sometimes it leads to a significant performance degradation of benchmarks unrelated to code changes. See 4000% performance regression with "-C target-cpu=x86-64-v3" and fat LTO rust-lang/rust#146497 for details.

  • criterion updated to version 0.7.

  • Unused macros removed (I found another unused macro!)

  • Remaining macros changed to use iter_batched() and iter_batched_ref().

  • Added macros to benchmark Single x N Values binary operations. This simulates real-world use cases like multiplication of many vectors by a single matrix.

    There is a ~2x performance difference between a case when both arguments are random on each iteration and a case when one argument is static and second is random on each iteration:

    click for details...

      mat2_mul_v              time:   [778.33 ps 785.41 ps 797.70 ps]
      Found 14 outliers among 100 measurements (14.00%)
        5 (5.00%) low severe
        4 (4.00%) high mild
        5 (5.00%) high severe
    
      mat3_mul_v              time:   [1.7001 ns 1.7051 ns 1.7111 ns]
      Found 11 outliers among 100 measurements (11.00%)
        1 (1.00%) low severe
        1 (1.00%) low mild
        8 (8.00%) high mild
        1 (1.00%) high severe
    
      mat4_mul_v              time:   [2.6101 ns 2.6223 ns 2.6374 ns]
      Found 8 outliers among 100 measurements (8.00%)
        1 (1.00%) low mild
        3 (3.00%) high mild
        4 (4.00%) high severe
    
      single_mat2_mul_v       time:   [402.65 ps 403.62 ps 404.75 ps]
      Found 11 outliers among 100 measurements (11.00%)
        3 (3.00%) low mild
        5 (5.00%) high mild
        3 (3.00%) high severe
    
      single_mat3_mul_v       time:   [651.30 ps 654.06 ps 657.15 ps]
      Found 15 outliers among 100 measurements (15.00%)
        3 (3.00%) low mild
        8 (8.00%) high mild
        4 (4.00%) high severe
    
      single_mat4_mul_v       time:   [1.0628 ns 1.0645 ns 1.0666 ns]
      Found 8 outliers among 100 measurements (8.00%)
        1 (1.00%) low mild
        5 (5.00%) high mild
        2 (2.00%) high severe
    
      mat2_tr_mul_v           time:   [719.81 ps 721.99 ps 724.59 ps]
      Found 8 outliers among 100 measurements (8.00%)
        3 (3.00%) low mild
        5 (5.00%) high mild
    
      mat3_tr_mul_v           time:   [1.6685 ns 1.6758 ns 1.6841 ns]
      Found 13 outliers among 100 measurements (13.00%)
        4 (4.00%) low severe
        1 (1.00%) low mild
        4 (4.00%) high mild
        4 (4.00%) high severe
    
      mat4_tr_mul_v           time:   [2.6739 ns 2.6897 ns 2.7080 ns]
      Found 16 outliers among 100 measurements (16.00%)
        2 (2.00%) low severe
        2 (2.00%) low mild
        4 (4.00%) high mild
        8 (8.00%) high severe
    
      single_mat2_tr_mul_v    time:   [353.36 ps 354.56 ps 356.03 ps]
      Found 6 outliers among 100 measurements (6.00%)
        2 (2.00%) low mild
        1 (1.00%) high mild
        3 (3.00%) high severe
    
      single_mat3_tr_mul_v    time:   [779.82 ps 782.84 ps 786.37 ps]
      Found 10 outliers among 100 measurements (10.00%)
        1 (1.00%) low severe
        1 (1.00%) low mild
        6 (6.00%) high mild
        2 (2.00%) high severe
    
      single_mat4_tr_mul_v    time:   [1.1918 ns 1.1946 ns 1.1977 ns]
      Found 6 outliers among 100 measurements (6.00%)
        3 (3.00%) low mild
        1 (1.00%) high mild
        2 (2.00%) high severe
    
      unit_quaternion_mul_v   time:   [1.5002 ns 1.5088 ns 1.5183 ns]
      												change: [−0.0578% +0.3775% +0.8498%] (p = 0.10 > 0.05)
      												No change in performance detected.
      Found 6 outliers among 100 measurements (6.00%)
        3 (3.00%) high mild
        3 (3.00%) high severe
    
      single_unit_quaternion_mul_v
      												time:   [1.0489 ns 1.0531 ns 1.0584 ns]
      Found 14 outliers among 100 measurements (14.00%)
        2 (2.00%) low severe
        1 (1.00%) low mild
        4 (4.00%) high mild
        7 (7.00%) high severe
    
  • Uncommented some quaternion benchmarks. I do not know why those benchmarks were commented out in the first place.

  • Remaining non-macro benchmarks changed to use iter_batched() and iter_batched_ref().

    The bulk of the changes was done by Claude Sonnet 4. Additionally I moved DVector allocations outside of the benchmarks, and added anything allocated and not consumed into a return tuple of a benchmark closure to ensure that implicit drop/free is not included into the measured time.

  • Added reproducible_smatrix(). Some algorithms may not converge when used on completely random values with the default value of epsilon and unlimited iterations. reproducible_dmatrix() already exist to circumvent this for DMatrix, so I implemented the same for SMatrix.

    In my tests this problem manifested itself only on schur_decompose_4x4, but I decided to apply similar fix for all benchmarks that also use reproducible_dmatrix() for DMatrix.

  • Cholesky decomposition benchmarks changed to use reproducible_dmatrix().

    Random matrices may be not positive-definite and Cholesky decomposition benchmarks panic because of that:

      Benchmarking cholesky_decompose_unpack_100x100: Warming up for 3.0000 s
      thread 'main' panicked at benches/linalg/cholesky.rs:38:45:
      called `Option::unwrap()` on a `None` value
    

Total run time of full benchmark suite on my machine (AMD 5950X) has not changed and is still around ~30 minutes. Here are the results with difference from current main:

click for details...

mat2_mul_m              time:   [1.1043 ns 1.1058 ns 1.1077 ns]
						change: [+49.306% +49.651% +50.045%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) low severe
  2 (2.00%) high mild
  6 (6.00%) high severe

mat3_mul_m              time:   [3.1885 ns 3.1945 ns 3.2038 ns]
						change: [+102.62% +103.63% +104.86%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

mat4_mul_m              time:   [6.7759 ns 6.7840 ns 6.7929 ns]
						change: [+130.65% +131.50% +132.59%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) low severe
  3 (3.00%) high mild
  4 (4.00%) high severe

mat2_tr_mul_m           time:   [1.2882 ns 1.2901 ns 1.2926 ns]
						change: [+75.005% +75.472% +75.928%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low severe
  1 (1.00%) high mild
  3 (3.00%) high severe

mat3_tr_mul_m           time:   [3.1688 ns 3.1725 ns 3.1770 ns]
						change: [+101.61% +102.10% +102.66%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  4 (4.00%) high mild
  4 (4.00%) high severe

mat4_tr_mul_m           time:   [6.5406 ns 6.5453 ns 6.5508 ns]
						change: [+121.95% +122.66% +123.42%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  6 (6.00%) high severe

mat2_add_m              time:   [644.68 ps 645.88 ps 647.24 ps]
						change: [−13.049% −12.530% −11.972%] (p = 0.00 < 0.05)
						Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

mat3_add_m              time:   [1.3543 ns 1.3572 ns 1.3607 ns]
						change: [−14.707% −13.705% −12.403%] (p = 0.00 < 0.05)
						Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  6 (6.00%) low severe
  5 (5.00%) high mild
  4 (4.00%) high severe

mat4_add_m              time:   [2.3987 ns 2.4015 ns 2.4044 ns]
						change: [−20.676% −19.615% −18.453%] (p = 0.00 < 0.05)
						Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) low severe
  5 (5.00%) high mild
  3 (3.00%) high severe

mat2_sub_m              time:   [637.47 ps 638.88 ps 640.62 ps]
						change: [−13.604% −13.020% −12.333%] (p = 0.00 < 0.05)
						Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

mat3_sub_m              time:   [1.3531 ns 1.3546 ns 1.3562 ns]
						change: [−15.139% −14.610% −14.084%] (p = 0.00 < 0.05)
						Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) low severe
  1 (1.00%) low mild
  6 (6.00%) high mild
  4 (4.00%) high severe

mat4_sub_m              time:   [2.3972 ns 2.3996 ns 2.4021 ns]
						change: [−20.412% −19.249% −18.330%] (p = 0.00 < 0.05)
						Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low severe
  1 (1.00%) high mild
  3 (3.00%) high severe

mat2_mul_v              time:   [774.43 ps 775.48 ps 776.73 ps]
						change: [+144.90% +145.51% +146.12%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  5 (5.00%) high mild
  3 (3.00%) high severe

mat3_mul_v              time:   [1.6843 ns 1.6858 ns 1.6874 ns]
						change: [+284.57% +285.82% +287.43%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low severe
  1 (1.00%) high mild
  3 (3.00%) high severe

mat4_mul_v              time:   [2.6029 ns 2.6196 ns 2.6485 ns]
						change: [+255.34% +257.62% +261.68%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

single_mat2_mul_v       time:   [392.29 ps 393.45 ps 394.87 ps]
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

single_mat3_mul_v       time:   [650.16 ps 651.47 ps 653.07 ps]
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low severe
  3 (3.00%) high mild
  4 (4.00%) high severe

single_mat4_mul_v       time:   [1.0665 ns 1.0690 ns 1.0722 ns]
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

mat2_tr_mul_v           time:   [719.95 ps 720.92 ps 722.16 ps]
						change: [+127.86% +128.34% +128.98%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  7 (7.00%) high mild
  4 (4.00%) high severe

mat3_tr_mul_v           time:   [1.6551 ns 1.6564 ns 1.6577 ns]
						change: [+277.57% +278.32% +279.16%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

mat4_tr_mul_v           time:   [2.6477 ns 2.6546 ns 2.6666 ns]
						change: [+259.47% +260.55% +261.67%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low severe
  3 (3.00%) high mild
  3 (3.00%) high severe

single_mat2_tr_mul_v    time:   [353.60 ps 355.50 ps 358.48 ps]
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

single_mat3_tr_mul_v    time:   [778.13 ps 779.43 ps 781.25 ps]
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  3 (3.00%) high mild
  5 (5.00%) high severe

single_mat4_tr_mul_v    time:   [1.1887 ns 1.1906 ns 1.1930 ns]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

mat2_mul_s              time:   [774.44 ps 775.33 ps 776.37 ps]
						change: [+6.0947% +6.3308% +6.5936%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe

mat3_mul_s              time:   [962.59 ps 964.98 ps 967.43 ps]
						change: [−38.097% −37.694% −37.145%] (p = 0.00 < 0.05)
						Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

mat4_mul_s              time:   [1.6589 ns 1.6640 ns 1.6684 ns]
						change: [−43.668% −43.130% −42.518%] (p = 0.00 < 0.05)
						Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  8 (8.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  6 (6.00%) high severe

mat2_div_s              time:   [803.09 ps 804.70 ps 806.56 ps]
						change: [+10.272% +10.596% +10.960%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

mat3_div_s              time:   [2.4929 ns 2.4947 ns 2.4967 ns]
						change: [+58.793% +59.185% +59.709%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) low severe
  5 (5.00%) high mild
  4 (4.00%) high severe

mat4_div_s              time:   [5.1650 ns 5.1688 ns 5.1735 ns]
						change: [+76.816% +77.215% +77.629%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

mat2_inv                time:   [1.1514 ns 1.1523 ns 1.1533 ns]
						change: [−41.682% −41.556% −41.439%] (p = 0.00 < 0.05)
						Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

mat3_inv                time:   [3.3641 ns 3.3707 ns 3.3826 ns]
						change: [−37.473% −37.358% −37.214%] (p = 0.00 < 0.05)
						Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  5 (5.00%) high severe

mat4_inv                time:   [25.970 ns 26.006 ns 26.062 ns]
						change: [−9.0865% −8.9013% −8.6986%] (p = 0.00 < 0.05)
						Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe

mat2_transpose          time:   [409.94 ps 410.77 ps 411.75 ps]
						change: [−62.889% −62.624% −62.331%] (p = 0.00 < 0.05)
						Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

mat3_transpose          time:   [947.42 ps 953.20 ps 961.97 ps]
						change: [−61.273% −60.195% −58.616%] (p = 0.00 < 0.05)
						Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  3 (3.00%) high severe

mat4_transpose          time:   [1.6510 ns 1.6551 ns 1.6612 ns]
						change: [−65.877% −65.592% −65.225%] (p = 0.00 < 0.05)
						Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

mat_div_scalar          time:   [480.25 µs 480.55 µs 480.99 µs]
						change: [−22.235% −22.169% −22.095%] (p = 0.00 < 0.05)
						Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

mat100_add_mat100       time:   [3.0426 µs 3.0910 µs 3.1351 µs]
						change: [+81.145% +84.392% +88.112%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  7 (7.00%) high mild
  1 (1.00%) high severe

mat4_mul_mat4           time:   [36.836 ns 36.859 ns 36.886 ns]
						change: [+24.966% +25.568% +26.171%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) low severe
  4 (4.00%) high mild
  2 (2.00%) high severe

mat5_mul_mat5           time:   [56.715 ns 56.876 ns 57.015 ns]
						change: [+10.239% +10.666% +11.091%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  6 (6.00%) high mild

mat6_mul_mat6           time:   [83.817 ns 83.999 ns 84.156 ns]
						change: [+10.675% +10.890% +11.065%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

mat7_mul_mat7           time:   [93.211 ns 93.386 ns 93.534 ns]
						change: [+10.654% +10.892% +11.129%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild

mat8_mul_mat8           time:   [88.919 ns 89.410 ns 89.884 ns]
						change: [+22.808% +23.376% +23.888%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

mat9_mul_mat9           time:   [207.12 ns 209.04 ns 211.17 ns]
						change: [+14.053% +14.646% +15.258%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  9 (9.00%) low mild
  1 (1.00%) high mild

mat10_mul_mat10         time:   [236.75 ns 237.11 ns 237.47 ns]
						change: [+20.055% +20.366% +20.651%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) low severe
  7 (7.00%) low mild
  1 (1.00%) high mild

mat10_mul_mat10_static  time:   [116.68 ns 117.15 ns 117.62 ns]
						change: [+11.160% +11.617% +12.049%] (p = 0.00 < 0.05)
						Performance has regressed.

mat100_mul_mat100       time:   [40.188 µs 40.327 µs 40.459 µs]
						change: [+3.2490% +3.4765% +3.7130%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe

mat500_mul_mat500       time:   [4.3909 ms 4.3944 ms 4.3978 ms]
						change: [+0.8556% +0.9519% +1.0448%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) low severe
  2 (2.00%) high mild
  1 (1.00%) high severe

iter                    time:   [840.01 µs 840.39 µs 840.81 µs]
						change: [+10.527% +10.726% +10.915%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) high mild
  11 (11.00%) high severe

iter_rev                time:   [210.14 µs 211.10 µs 212.84 µs]
						change: [+0.2455% +0.7119% +1.7846%] (p = 0.02 < 0.05)
						Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

copy_from               time:   [199.77 µs 200.80 µs 202.55 µs]
						change: [+41.195% +41.962% +43.287%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  8 (8.00%) low mild
  1 (1.00%) high severe

axpy                    time:   [31.301 µs 33.301 µs 34.957 µs]
						change: [+40.726% +52.001% +63.112%] (p = 0.00 < 0.05)
						Performance has regressed.

tr_mul_to               time:   [126.46 µs 127.12 µs 128.09 µs]
						change: [−4.0124% −3.5145% −2.7708%] (p = 0.00 < 0.05)
						Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

mat_mul_mat             time:   [39.252 µs 39.443 µs 39.626 µs]
						change: [−0.7084% −0.3800% −0.0130%] (p = 0.02 < 0.05)
						Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  8 (8.00%) high mild
  2 (2.00%) high severe

mat100_from_fn          time:   [6.8398 µs 6.8418 µs 6.8446 µs]
						change: [+519.35% +522.43% +524.76%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

mat500_from_fn          time:   [172.11 µs 172.14 µs 172.18 µs]
						change: [+498.70% +499.32% +499.93%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

vec2_add_v_f32          time:   [303.98 ps 304.76 ps 305.65 ps]
						change: [−5.1499% −4.3536% −3.5996%] (p = 0.00 < 0.05)
						Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) low severe
  5 (5.00%) high mild
  6 (6.00%) high severe

vec3_add_v_f32          time:   [586.36 ps 587.93 ps 589.92 ps]
						change: [+34.275% +34.886% +35.631%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  6 (6.00%) high severe

vec4_add_v_f32          time:   [603.45 ps 604.44 ps 605.59 ps]
						change: [−18.949% −18.215% −17.623%] (p = 0.00 < 0.05)
						Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

vec2_add_v_f64          time:   [602.08 ps 602.83 ps 603.64 ps]
						change: [+89.139% +90.573% +91.808%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

vec3_add_v_f64          time:   [910.94 ps 912.60 ps 914.56 ps]
						change: [+107.10% +108.18% +109.41%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) low severe
  6 (6.00%) high mild
  3 (3.00%) high severe

vec4_add_v_f64          time:   [1.1894 ns 1.1933 ns 1.1963 ns]
						change: [+82.607% +85.023% +86.911%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high severe

vec2_sub_v              time:   [303.45 ps 304.42 ps 305.37 ps]
						change: [−5.3598% −4.4578% −3.6738%] (p = 0.00 < 0.05)
						Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  8 (8.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

vec3_sub_v              time:   [672.95 ps 674.82 ps 676.51 ps]
						change: [+51.463% +52.336% +53.346%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

vec4_sub_v              time:   [602.84 ps 604.65 ps 607.70 ps]
						change: [−19.744% −18.754% −17.881%] (p = 0.00 < 0.05)
						Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

vec2_mul_s              time:   [666.49 ps 667.29 ps 668.31 ps]
						change: [+111.37% +111.81% +112.32%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  4 (4.00%) low severe
  6 (6.00%) high mild
  6 (6.00%) high severe

vec3_mul_s              time:   [511.42 ps 513.44 ps 515.86 ps]
						change: [+15.556% +16.273% +17.049%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

vec4_mul_s              time:   [774.13 ps 775.22 ps 776.52 ps]
						change: [+5.1602% +5.5545% +6.0225%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  7 (7.00%) high severe

vec2_div_s              time:   [1.3658 ns 1.3694 ns 1.3726 ns]
						change: [+328.67% +329.83% +331.09%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

vec3_div_s              time:   [607.73 ps 608.63 ps 609.66 ps]
						change: [+37.642% +38.017% +38.440%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low severe
  8 (8.00%) high mild
  6 (6.00%) high severe

vec4_div_s              time:   [802.59 ps 803.62 ps 804.82 ps]
						change: [+8.9451% +9.3240% +9.7149%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low severe
  6 (6.00%) high mild
  2 (2.00%) high severe

vec2_dot_f32            time:   [461.20 ps 461.73 ps 462.30 ps]
						change: [+117.88% +119.27% +120.79%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  9 (9.00%) high severe

vec3_dot_f32            time:   [688.24 ps 689.05 ps 689.95 ps]
						change: [+225.49% +227.19% +229.16%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

vec4_dot_f32            time:   [917.20 ps 921.23 ps 928.57 ps]
						change: [+338.59% +341.30% +344.17%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  8 (8.00%) high mild
  5 (5.00%) high severe

vec2_dot_f64            time:   [596.11 ps 597.51 ps 598.79 ps]
						change: [+177.79% +179.60% +182.13%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

vec3_dot_f64            time:   [749.32 ps 751.02 ps 752.81 ps]
						change: [+253.48% +257.12% +262.11%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

vec4_dot_f64            time:   [1.0145 ns 1.0185 ns 1.0230 ns]
						change: [+376.34% +379.47% +383.46%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

vec3_cross              time:   [971.01 ps 971.87 ps 972.73 ps]
						change: [+122.34% +122.74% +123.17%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

vec2_norm               time:   [1.0612 ns 1.0623 ns 1.0637 ns]
						change: [−0.0722% +0.0499% +0.1765%] (p = 0.44 > 0.05)
						No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) low mild
  2 (2.00%) high severe

vec3_norm               time:   [1.0649 ns 1.0665 ns 1.0694 ns]
						change: [−4.3787% −4.1856% −3.8679%] (p = 0.00 < 0.05)
						Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

vec4_norm               time:   [1.0733 ns 1.0739 ns 1.0746 ns]
						change: [−4.5616% −3.9738% −2.9157%] (p = 0.00 < 0.05)
						Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  2 (2.00%) low severe
  7 (7.00%) low mild
  5 (5.00%) high mild
  5 (5.00%) high severe

vec2_normalize          time:   [2.5310 ns 2.5326 ns 2.5345 ns]
						change: [+3.5769% +3.6696% +3.7678%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

vec3_normalize          time:   [2.5389 ns 2.5409 ns 2.5424 ns]
						change: [+1.1411% +1.2860% +1.4910%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

vec4_normalize          time:   [1.8154 ns 1.8164 ns 1.8173 ns]
						change: [−1.1191% −0.9926% −0.8485%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

vec10000_dot_f64        time:   [2.0296 µs 2.0337 µs 2.0383 µs]
						change: [+71.107% +72.619% +74.228%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) low severe
  3 (3.00%) high mild
  4 (4.00%) high severe

vec10000_dot_f32        time:   [1.1891 µs 1.1926 µs 1.1962 µs]
						change: [+6.3585% +7.1059% +7.9357%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

vec10000_axpy_f64       time:   [2.0702 µs 2.0739 µs 2.0777 µs]
						change: [+39.373% +40.227% +41.210%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

vec10000_axpy_beta_f64  time:   [2.0914 µs 2.0962 µs 2.1012 µs]
						change: [+31.958% +32.843% +33.467%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) low severe
  5 (5.00%) high mild
  2 (2.00%) high severe

vec10000_axpy_f64_slice time:   [2.0272 µs 2.0303 µs 2.0335 µs]
						change: [+35.880% +36.621% +37.307%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low severe
  2 (2.00%) high mild
  1 (1.00%) high severe

vec10000_axpy_f64_static
						time:   [13.917 µs 13.965 µs 14.005 µs]
						change: [+859.61% +869.73% +879.35%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  2 (2.00%) high severe

vec10000_axpy_f32       time:   [1.0402 µs 1.0421 µs 1.0437 µs]
						change: [+38.710% +39.603% +40.363%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

vec10000_axpy_beta_f32  time:   [1.0329 µs 1.0346 µs 1.0364 µs]
						change: [+30.705% +31.490% +32.040%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

quaternion_add_q        time:   [642.58 ps 650.39 ps 662.45 ps]
						change: [−11.788% −10.934% −9.9463%] (p = 0.00 < 0.05)
						Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

quaternion_sub_q        time:   [641.16 ps 643.22 ps 645.88 ps]
						change: [−12.654% −11.822% −10.943%] (p = 0.00 < 0.05)
						Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

quaternion_mul_q        time:   [1.4252 ns 1.4271 ns 1.4294 ns]
						change: [+94.545% +95.022% +95.499%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

unit_quaternion_mul_v   time:   [1.4859 ns 1.4874 ns 1.4890 ns]
						change: [+242.77% +243.56% +244.31%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

single_unit_quaternion_mul_v
						time:   [1.0422 ns 1.0457 ns 1.0504 ns]
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  4 (4.00%) high mild
  4 (4.00%) high severe

quaternion_mul_s        time:   [771.17 ps 772.18 ps 773.37 ps]
						change: [+6.1278% +6.4276% +6.7583%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

quaternion_div_s        time:   [798.54 ps 799.82 ps 801.43 ps]
						change: [+9.2123% +9.7287% +10.338%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

quaternion_inv          time:   [1.2401 ns 1.2408 ns 1.2417 ns]
						change: [−43.660% −43.521% −43.317%] (p = 0.00 < 0.05)
						Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low severe
  5 (5.00%) high mild
  6 (6.00%) high severe

unit_quaternion_inv     time:   [596.01 ps 598.93 ps 602.66 ps]
						change: [−49.707% −49.184% −48.445%] (p = 0.00 < 0.05)
						Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  6 (6.00%) high mild
  9 (9.00%) high severe

quaternion_conjugate    time:   [604.36 ps 608.60 ps 613.48 ps]
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

quaternion_normalize    time:   [1.8268 ns 1.8274 ns 1.8281 ns]
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) low severe
  4 (4.00%) low mild
  7 (7.00%) high mild
  3 (3.00%) high severe

bidiagonalize_100x100   time:   [265.91 µs 266.00 µs 266.11 µs]
						change: [+0.7553% +0.8363% +0.9114%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

bidiagonalize_100x500   time:   [2.0053 ms 2.0060 ms 2.0065 ms]
						change: [+4.0325% +4.2372% +4.3938%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) low severe
  2 (2.00%) high mild
  5 (5.00%) high severe

bidiagonalize_4x4       time:   [266.92 ns 267.24 ns 267.62 ns]
						change: [+7.1063% +7.2057% +7.3231%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild
  13 (13.00%) high mild
  4 (4.00%) high severe

Benchmarking bidiagonalize_500x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
bidiagonalize_500x100   time:   [1.6781 ms 1.6793 ms 1.6804 ms]
						change: [+1.3944% +1.5312% +1.6400%] (p = 0.00 < 0.05)
						Performance has regressed.

bidiagonalize_unpack_100x100
						time:   [522.13 µs 522.36 µs 522.63 µs]
						change: [−0.5318% −0.4044% −0.2627%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

bidiagonalize_unpack_100x500
						time:   [2.9858 ms 2.9916 ms 2.9976 ms]
						change: [−0.7824% −0.3995% −0.0370%] (p = 0.04 < 0.05)
						Change within noise threshold.

bidiagonalize_unpack_500x100
						time:   [2.5884 ms 2.5896 ms 2.5910 ms]
						change: [+0.0767% +0.1539% +0.2316%] (p = 0.00 < 0.05)
						Change within noise threshold.

cholesky_100x100        time:   [31.084 µs 31.101 µs 31.122 µs]
						change: [−5.0365% −4.7949% −4.4205%] (p = 0.00 < 0.05)
						Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low severe
  4 (4.00%) low mild
  1 (1.00%) high mild
  9 (9.00%) high severe

cholesky_500x500        time:   [4.4799 ms 4.4849 ms 4.4903 ms]
						change: [−0.5985% −0.3685% −0.1374%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

cholesky_decompose_unpack_100x100
						time:   [31.659 µs 31.685 µs 31.727 µs]
						change: [−4.9712% −4.7445% −4.3325%] (p = 0.00 < 0.05)
						Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

cholesky_decompose_unpack_500x500
						time:   [4.4795 ms 4.4845 ms 4.4910 ms]
						change: [−1.9595% −1.7121% −1.4978%] (p = 0.00 < 0.05)
						Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  7 (7.00%) high severe

cholesky_solve_10x10    time:   [170.70 ns 170.76 ns 170.82 ns]
						change: [+8.0936% +8.1777% +8.2764%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

cholesky_solve_100x100  time:   [2.9071 µs 2.9117 µs 2.9174 µs]
						change: [+8.4770% +8.9956% +9.6254%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

cholesky_solve_500x500  time:   [54.193 µs 54.303 µs 54.417 µs]
						change: [+3.9332% +4.1755% +4.4477%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

cholesky_inverse_10x10  time:   [1.3189 µs 1.3195 µs 1.3201 µs]
						change: [+2.5360% +2.6238% +2.7131%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

cholesky_inverse_100x100
						time:   [270.85 µs 270.88 µs 270.92 µs]
						change: [−0.9726% −0.8524% −0.7319%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

cholesky_inverse_500x500
						time:   [26.673 ms 26.694 ms 26.714 ms]
						change: [+1.0784% +1.1816% +1.2794%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 23 outliers among 100 measurements (23.00%)
  19 (19.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high severe

full_piv_lu_decompose_10x10
						time:   [582.31 ns 582.48 ns 582.67 ns]
						change: [+19.583% +19.702% +19.795%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  6 (6.00%) high mild
  2 (2.00%) high severe

full_piv_lu_decompose_100x100
						time:   [218.73 µs 218.78 µs 218.84 µs]
						change: [+5.8729% +5.9828% +6.0904%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  5 (5.00%) low mild
  1 (1.00%) high severe

full_piv_lu_solve_10x10 time:   [124.88 ns 124.94 ns 125.02 ns]
						change: [+7.4724% +7.6252% +7.7787%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) low severe
  6 (6.00%) high mild
  4 (4.00%) high severe

full_piv_lu_solve_100x100
						time:   [2.5202 µs 2.5244 µs 2.5289 µs]
						change: [+11.226% +11.847% +12.518%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  14 (14.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild

full_piv_lu_inverse_10x10
						time:   [869.61 ns 870.27 ns 871.19 ns]
						change: [+4.7996% +4.9224% +5.0608%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low severe
  1 (1.00%) high mild
  4 (4.00%) high severe

full_piv_lu_inverse_100x100
						time:   [212.68 µs 212.83 µs 213.05 µs]
						change: [−0.2835% −0.0351% +0.1310%] (p = 0.80 > 0.05)
						No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

full_piv_lu_determinant_10x10
						time:   [15.320 ns 15.338 ns 15.357 ns]
						change: [+410.70% +421.41% +430.41%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild

full_piv_lu_determinant_100x100
						time:   [137.44 ns 139.37 ns 141.00 ns]
						change: [+213.54% +227.75% +241.42%] (p = 0.00 < 0.05)
						Performance has regressed.

hessenberg_decompose_4x4
						time:   [82.510 ns 82.538 ns 82.564 ns]
						change: [−27.950% −27.887% −27.830%] (p = 0.00 < 0.05)
						Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

hessenberg_decompose_100x100
						time:   [295.98 µs 296.16 µs 296.44 µs]
						change: [+3.3234% +3.5705% +3.7986%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

hessenberg_decompose_200x200
						time:   [2.2647 ms 2.2681 ms 2.2714 ms]
						change: [+4.8426% +4.9983% +5.1646%] (p = 0.00 < 0.05)
						Performance has regressed.

hessenberg_decompose_unpack_100x100
						time:   [435.30 µs 435.75 µs 436.12 µs]
						change: [+2.7479% +2.8420% +2.9424%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

hessenberg_decompose_unpack_200x200
						time:   [3.2667 ms 3.2678 ms 3.2690 ms]
						change: [+3.9624% +4.0021% +4.0423%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 22 outliers among 100 measurements (22.00%)
  13 (13.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  5 (5.00%) high severe

lu_decompose_10x10      time:   [353.04 ns 353.16 ns 353.31 ns]
						change: [−5.0408% −4.9435% −4.8487%] (p = 0.00 < 0.05)
						Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  4 (4.00%) low severe
  4 (4.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe

lu_decompose_100x100    time:   [71.544 µs 71.560 µs 71.579 µs]
						change: [−1.7176% −1.6430% −1.5721%] (p = 0.00 < 0.05)
						Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

lu_solve_10x10          time:   [115.42 ns 115.52 ns 115.61 ns]
						change: [+3.9363% +4.1024% +4.2557%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) low severe
  8 (8.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

lu_solve_100x100        time:   [2.5152 µs 2.5190 µs 2.5225 µs]
						change: [+15.120% +15.625% +16.088%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild

lu_inverse_10x10        time:   [902.55 ns 903.32 ns 903.97 ns]
						change: [+0.7407% +0.8734% +1.0263%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

lu_inverse_100x100      time:   [216.21 µs 216.47 µs 216.80 µs]
						change: [−0.6663% −0.5584% −0.4316%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 18 outliers among 100 measurements (18.00%)
  2 (2.00%) low severe
  4 (4.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

lu_determinant_10x10    time:   [13.394 ns 13.481 ns 13.665 ns]
						change: [+508.98% +524.96% +543.53%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

lu_determinant_100x100  time:   [149.12 ns 150.16 ns 151.08 ns]
						change: [+265.69% +281.86% +296.23%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  10 (10.00%) low severe
  4 (4.00%) low mild

qr_decompose_100x100    time:   [141.62 µs 141.65 µs 141.69 µs]
						change: [+0.6391% +0.8447% +0.9784%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

Benchmarking qr_decompose_100x500: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
qr_decompose_100x500    time:   [1.0071 ms 1.0082 ms 1.0097 ms]
						change: [+0.9031% +1.2358% +1.6126%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
  12 (12.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

qr_decompose_4x4        time:   [100.40 ns 100.43 ns 100.45 ns]
						change: [−19.315% −19.268% −19.224%] (p = 0.00 < 0.05)
						Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  4 (4.00%) high severe

Benchmarking qr_decompose_500x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60.
qr_decompose_500x100    time:   [847.17 µs 847.68 µs 848.21 µs]
						change: [+2.1441% +2.3425% +2.5069%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

qr_decompose_unpack_100x100
						time:   [283.22 µs 283.26 µs 283.30 µs]
						change: [−0.3591% −0.2383% −0.1147%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 23 outliers among 100 measurements (23.00%)
  21 (21.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high severe

Benchmarking qr_decompose_unpack_100x500: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60.
qr_decompose_unpack_100x500
						time:   [1.1399 ms 1.1429 ms 1.1457 ms]
						change: [−1.9555% −1.8085% −1.6312%] (p = 0.00 < 0.05)
						Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking qr_decompose_unpack_500x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.6s, enable flat sampling, or reduce sample count to 50.
qr_decompose_unpack_500x100
						time:   [1.6633 ms 1.6640 ms 1.6648 ms]
						change: [+1.4516% +1.5245% +1.5969%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low severe
  5 (5.00%) low mild
  4 (4.00%) high severe

qr_solve_10x10          time:   [156.51 ns 156.56 ns 156.61 ns]
						change: [+3.7415% +3.8709% +3.9947%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) low severe
  5 (5.00%) low mild
  1 (1.00%) high mild

qr_solve_100x100        time:   [3.5393 µs 3.5454 µs 3.5511 µs]
						change: [+6.0908% +6.5747% +6.9798%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) low mild

qr_inverse_10x10        time:   [806.75 ns 807.99 ns 809.61 ns]
						change: [+0.6973% +0.8242% +0.9558%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

qr_inverse_100x100      time:   [330.65 µs 330.74 µs 330.85 µs]
						change: [+1.2238% +1.3244% +1.4518%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

schur_decompose_4x4     time:   [969.14 ns 969.71 ns 970.18 ns]
						change: [−12.293% −12.223% −12.149%] (p = 0.00 < 0.05)
						Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

schur_decompose_10x10   time:   [7.3226 µs 7.3237 µs 7.3247 µs]
						change: [+0.3785% +0.4095% +0.4394%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  3 (3.00%) high severe

schur_decompose_100x100 time:   [2.5760 ms 2.5763 ms 2.5768 ms]
						change: [+0.7992% +0.8504% +0.8935%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

schur_decompose_200x200 time:   [18.285 ms 18.296 ms 18.308 ms]
						change: [+1.9360% +2.0941% +2.2427%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

eigenvalues_4x4         time:   [937.94 ns 938.15 ns 938.38 ns]
						change: [+25.764% +25.898% +26.023%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild

eigenvalues_10x10       time:   [5.9066 µs 5.9088 µs 5.9117 µs]
						change: [+0.1208% +0.1938% +0.2740%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking eigenvalues_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50.
eigenvalues_100x100     time:   [1.5870 ms 1.5873 ms 1.5876 ms]
						change: [−0.8569% −0.8247% −0.7914%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

eigenvalues_200x200     time:   [11.081 ms 11.088 ms 11.102 ms]
						change: [+0.0054% +0.2956% +0.4946%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

solve_l_triangular_100x100
						time:   [1.3250 µs 1.3651 µs 1.4012 µs]
						change: [+22.932% +24.999% +27.087%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  10 (10.00%) high mild
  2 (2.00%) high severe

solve_l_triangular_1000x1000
						time:   [101.52 µs 102.04 µs 102.85 µs]
						change: [+1.5784% +2.0953% +2.8471%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  9 (9.00%) high mild
  6 (6.00%) high severe

tr_solve_l_triangular_100x100
						time:   [2.0144 µs 2.0537 µs 2.0902 µs]
						change: [+13.600% +14.669% +15.998%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

tr_solve_l_triangular_1000x1000
						time:   [93.569 µs 94.056 µs 94.857 µs]
						change: [+1.2474% +1.7955% +2.5979%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

solve_u_triangular_100x100
						time:   [1.5878 µs 1.6615 µs 1.7405 µs]
						change: [+31.200% +34.370% +38.132%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  10 (10.00%) high mild
  3 (3.00%) high severe

solve_u_triangular_1000x1000
						time:   [105.07 µs 105.46 µs 106.12 µs]
						change: [+6.6559% +7.0936% +7.8401%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

tr_solve_u_triangular_100x100
						time:   [1.4369 µs 1.4697 µs 1.4986 µs]
						change: [+17.195% +18.687% +20.307%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  11 (11.00%) high mild
  2 (2.00%) high severe

tr_solve_u_triangular_1000x1000
						time:   [88.868 µs 89.303 µs 90.014 µs]
						change: [+4.2489% +4.7933% +5.6045%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

svd_decompose_2x2       time:   [22.913 ns 22.958 ns 23.017 ns]
						change: [+9.3648% +9.7443% +10.253%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

svd_decompose_3x3       time:   [359.30 ns 359.72 ns 360.20 ns]
						change: [+9.0123% +9.1174% +9.2394%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

svd_decompose_4x4       time:   [896.28 ns 896.55 ns 896.85 ns]
						change: [−7.1192% −7.0496% −6.9853%] (p = 0.00 < 0.05)
						Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

svd_decompose_10x10     time:   [5.7680 µs 5.7708 µs 5.7739 µs]
						change: [+1.1933% +1.4155% +1.6347%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

Benchmarking svd_decompose_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50.
svd_decompose_100x100   time:   [1.5704 ms 1.5709 ms 1.5715 ms]
						change: [+1.4465% +1.4891% +1.5357%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

svd_decompose_200x200   time:   [11.845 ms 11.847 ms 11.850 ms]
						change: [+1.4378% +1.4794% +1.5225%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe

rank_4x4                time:   [716.49 ns 716.62 ns 716.74 ns]
						change: [+4.9084% +4.9678% +5.0237%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

rank_10x10              time:   [4.2304 µs 4.2341 µs 4.2377 µs]
						change: [+0.4993% +0.6056% +0.7271%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

rank_100x100            time:   [522.74 µs 522.85 µs 522.97 µs]
						change: [+0.2822% +0.3170% +0.3535%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high severe

rank_200x200            time:   [3.0167 ms 3.0217 ms 3.0267 ms]
						change: [+0.3924% +0.5333% +0.6946%] (p = 0.00 < 0.05)
						Change within noise threshold.

singular_values_4x4     time:   [735.97 ns 736.08 ns 736.21 ns]
						change: [−7.6736% −7.6163% −7.5596%] (p = 0.00 < 0.05)
						Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high severe

singular_values_10x10   time:   [4.2987 µs 4.2997 µs 4.3010 µs]
						change: [+1.6193% +1.7215% +1.8186%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

singular_values_100x100 time:   [525.20 µs 525.36 µs 525.54 µs]
						change: [+0.4054% +0.4526% +0.4982%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

singular_values_200x200 time:   [3.0712 ms 3.0729 ms 3.0750 ms]
						change: [+2.1769% +2.2358% +2.3112%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

pseudo_inverse_4x4      time:   [877.64 ns 878.38 ns 879.12 ns]
						change: [−8.2828% −8.2216% −8.1662%] (p = 0.00 < 0.05)
						Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

pseudo_inverse_10x10    time:   [6.0008 µs 6.0034 µs 6.0064 µs]
						change: [+0.2665% +0.3678% +0.4766%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

Benchmarking pseudo_inverse_100x100: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50.
pseudo_inverse_100x100  time:   [1.6088 ms 1.6091 ms 1.6094 ms]
						change: [+0.1161% +0.2007% +0.2937%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe

pseudo_inverse_200x200  time:   [12.038 ms 12.042 ms 12.047 ms]
						change: [−0.4351% −0.2531% −0.0699%] (p = 0.01 < 0.05)
						Change within noise threshold.
Found 22 outliers among 100 measurements (22.00%)
  16 (16.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  3 (3.00%) high severe

symmetric_eigen_decompose_4x4
						time:   [518.00 ns 518.07 ns 518.15 ns]
						change: [+4.7008% +4.7492% +4.8006%] (p = 0.00 < 0.05)
						Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

symmetric_eigen_decompose_10x10
						time:   [3.6417 µs 3.6428 µs 3.6440 µs]
						change: [−0.1549% −0.0998% −0.0483%] (p = 0.00 < 0.05)
						Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe

symmetric_eigen_decompose_100x100
						time:   [761.64 µs 762.66 µs 763.80 µs]
						change: [−5.8109% −5.7178% −5.6284%] (p = 0.00 < 0.05)
						Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  9 (9.00%) low severe
  9 (9.00%) low mild
  1 (1.00%) high severe

symmetric_eigen_decompose_200x200
						time:   [5.1304 ms 5.1337 ms 5.1372 ms]
						change: [−9.4434% −9.3646% −9.2959%] (p = 0.00 < 0.05)
						Performance has improved.

During benchmarking I found that `codegen-units` with default value
leads to inconsistent results across recompilations (clean vs.
incremental). Also, sometimes it leads to a significant performance
degradation of benchmarks unrelated to code changes.

Also see rust-lang/rust#146497
Criterion generates a `Vec` of arguments and passes them through
the `black_box()` to guarantee that the benchmark closure is never
optimized out of the benchmarking loop.

This fixes dimforge#1547 for
benchmarks that use `bench_*!()` macros.
This simulates real-world use cases like multiplication of
many vectors by a single matrix.

There is a ~2x performance difference between a case when both arguments
are random on each iteration and a case when one argument is static and
second is random on each iteration:

	mat2_mul_v              time:   [778.33 ps 785.41 ps 797.70 ps]
	Found 14 outliers among 100 measurements (14.00%)
	  5 (5.00%) low severe
	  4 (4.00%) high mild
	  5 (5.00%) high severe

	mat3_mul_v              time:   [1.7001 ns 1.7051 ns 1.7111 ns]
	Found 11 outliers among 100 measurements (11.00%)
	  1 (1.00%) low severe
	  1 (1.00%) low mild
	  8 (8.00%) high mild
	  1 (1.00%) high severe

	mat4_mul_v              time:   [2.6101 ns 2.6223 ns 2.6374 ns]
	Found 8 outliers among 100 measurements (8.00%)
	  1 (1.00%) low mild
	  3 (3.00%) high mild
	  4 (4.00%) high severe

	single_mat2_mul_v       time:   [402.65 ps 403.62 ps 404.75 ps]
	Found 11 outliers among 100 measurements (11.00%)
	  3 (3.00%) low mild
	  5 (5.00%) high mild
	  3 (3.00%) high severe

	single_mat3_mul_v       time:   [651.30 ps 654.06 ps 657.15 ps]
	Found 15 outliers among 100 measurements (15.00%)
	  3 (3.00%) low mild
	  8 (8.00%) high mild
	  4 (4.00%) high severe

	single_mat4_mul_v       time:   [1.0628 ns 1.0645 ns 1.0666 ns]
	Found 8 outliers among 100 measurements (8.00%)
	  1 (1.00%) low mild
	  5 (5.00%) high mild
	  2 (2.00%) high severe

	mat2_tr_mul_v           time:   [719.81 ps 721.99 ps 724.59 ps]
	Found 8 outliers among 100 measurements (8.00%)
	  3 (3.00%) low mild
	  5 (5.00%) high mild

	mat3_tr_mul_v           time:   [1.6685 ns 1.6758 ns 1.6841 ns]
	Found 13 outliers among 100 measurements (13.00%)
	  4 (4.00%) low severe
	  1 (1.00%) low mild
	  4 (4.00%) high mild
	  4 (4.00%) high severe

	mat4_tr_mul_v           time:   [2.6739 ns 2.6897 ns 2.7080 ns]
	Found 16 outliers among 100 measurements (16.00%)
	  2 (2.00%) low severe
	  2 (2.00%) low mild
	  4 (4.00%) high mild
	  8 (8.00%) high severe

	single_mat2_tr_mul_v    time:   [353.36 ps 354.56 ps 356.03 ps]
	Found 6 outliers among 100 measurements (6.00%)
	  2 (2.00%) low mild
	  1 (1.00%) high mild
	  3 (3.00%) high severe

	single_mat3_tr_mul_v    time:   [779.82 ps 782.84 ps 786.37 ps]
	Found 10 outliers among 100 measurements (10.00%)
	  1 (1.00%) low severe
	  1 (1.00%) low mild
	  6 (6.00%) high mild
	  2 (2.00%) high severe

	single_mat4_tr_mul_v    time:   [1.1918 ns 1.1946 ns 1.1977 ns]
	Found 6 outliers among 100 measurements (6.00%)
	  3 (3.00%) low mild
	  1 (1.00%) high mild
	  2 (2.00%) high severe

	unit_quaternion_mul_v   time:   [1.5002 ns 1.5088 ns 1.5183 ns]
							change: [−0.0578% +0.3775% +0.8498%] (p = 0.10 > 0.05)
							No change in performance detected.
	Found 6 outliers among 100 measurements (6.00%)
	  3 (3.00%) high mild
	  3 (3.00%) high severe

	single_unit_quaternion_mul_v
							time:   [1.0489 ns 1.0531 ns 1.0584 ns]
	Found 14 outliers among 100 measurements (14.00%)
	  2 (2.00%) low severe
	  1 (1.00%) low mild
	  4 (4.00%) high mild
	  7 (7.00%) high severe
I do not know why those benchmarks were commented out.
…hmarks

The bulk of the changes was done Claude Sonnet 4. Additionally I moved
`DVector` allocations outside of the benchmark, and added anything
allocated and not consumed into a return tuple of a benchmark closure to
ensure that implicit drop/free is not included into the measured time.

This fixes https://github.com/dimforge/nalgebra/issues/1547 for the
remaining benchmarks.

Benchmark results before vs. after all changes:

	mat2_mul_m              time:   [1.1043 ns 1.1058 ns 1.1077 ns]
							change: [+49.306% +49.651% +50.045%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 12 outliers among 100 measurements (12.00%)
	  4 (4.00%) low severe
	  2 (2.00%) high mild
	  6 (6.00%) high severe

	mat3_mul_m              time:   [3.1885 ns 3.1945 ns 3.2038 ns]
							change: [+102.62% +103.63% +104.86%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 7 outliers among 100 measurements (7.00%)
	  2 (2.00%) low mild
	  2 (2.00%) high mild
	  3 (3.00%) high severe

	mat4_mul_m              time:   [6.7759 ns 6.7840 ns 6.7929 ns]
							change: [+130.65% +131.50% +132.59%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 11 outliers among 100 measurements (11.00%)
	  4 (4.00%) low severe
	  3 (3.00%) high mild
	  4 (4.00%) high severe

	mat2_tr_mul_m           time:   [1.2882 ns 1.2901 ns 1.2926 ns]
							change: [+75.005% +75.472% +75.928%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 7 outliers among 100 measurements (7.00%)
	  3 (3.00%) low severe
	  1 (1.00%) high mild
	  3 (3.00%) high severe

	mat3_tr_mul_m           time:   [3.1688 ns 3.1725 ns 3.1770 ns]
							change: [+101.61% +102.10% +102.66%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  2 (2.00%) low severe
	  4 (4.00%) high mild
	  4 (4.00%) high severe

	mat4_tr_mul_m           time:   [6.5406 ns 6.5453 ns 6.5508 ns]
							change: [+121.95% +122.66% +123.42%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 15 outliers among 100 measurements (15.00%)
	  3 (3.00%) low severe
	  1 (1.00%) low mild
	  5 (5.00%) high mild
	  6 (6.00%) high severe

	mat2_add_m              time:   [644.68 ps 645.88 ps 647.24 ps]
							change: [−13.049% −12.530% −11.972%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 8 outliers among 100 measurements (8.00%)
	  4 (4.00%) low severe
	  1 (1.00%) low mild
	  1 (1.00%) high mild
	  2 (2.00%) high severe

	mat3_add_m              time:   [1.3543 ns 1.3572 ns 1.3607 ns]
							change: [−14.707% −13.705% −12.403%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 15 outliers among 100 measurements (15.00%)
	  6 (6.00%) low severe
	  5 (5.00%) high mild
	  4 (4.00%) high severe

	mat4_add_m              time:   [2.3987 ns 2.4015 ns 2.4044 ns]
							change: [−20.676% −19.615% −18.453%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 14 outliers among 100 measurements (14.00%)
	  6 (6.00%) low severe
	  5 (5.00%) high mild
	  3 (3.00%) high severe

	mat2_sub_m              time:   [637.47 ps 638.88 ps 640.62 ps]
							change: [−13.604% −13.020% −12.333%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 13 outliers among 100 measurements (13.00%)
	  4 (4.00%) low severe
	  2 (2.00%) low mild
	  2 (2.00%) high mild
	  5 (5.00%) high severe

	mat3_sub_m              time:   [1.3531 ns 1.3546 ns 1.3562 ns]
							change: [−15.139% −14.610% −14.084%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 16 outliers among 100 measurements (16.00%)
	  5 (5.00%) low severe
	  1 (1.00%) low mild
	  6 (6.00%) high mild
	  4 (4.00%) high severe

	mat4_sub_m              time:   [2.3972 ns 2.3996 ns 2.4021 ns]
							change: [−20.412% −19.249% −18.330%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 10 outliers among 100 measurements (10.00%)
	  6 (6.00%) low severe
	  1 (1.00%) high mild
	  3 (3.00%) high severe

	mat2_mul_v              time:   [774.43 ps 775.48 ps 776.73 ps]
							change: [+144.90% +145.51% +146.12%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  2 (2.00%) low severe
	  5 (5.00%) high mild
	  3 (3.00%) high severe

	mat3_mul_v              time:   [1.6843 ns 1.6858 ns 1.6874 ns]
							change: [+284.57% +285.82% +287.43%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 7 outliers among 100 measurements (7.00%)
	  3 (3.00%) low severe
	  1 (1.00%) high mild
	  3 (3.00%) high severe

	mat4_mul_v              time:   [2.6029 ns 2.6196 ns 2.6485 ns]
							change: [+255.34% +257.62% +261.68%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  2 (2.00%) low severe
	  1 (1.00%) low mild
	  2 (2.00%) high mild
	  5 (5.00%) high severe

	single_mat2_mul_v       time:   [392.29 ps 393.45 ps 394.87 ps]
	Found 8 outliers among 100 measurements (8.00%)
	  6 (6.00%) high mild
	  2 (2.00%) high severe

	single_mat3_mul_v       time:   [650.16 ps 651.47 ps 653.07 ps]
	Found 9 outliers among 100 measurements (9.00%)
	  2 (2.00%) low severe
	  3 (3.00%) high mild
	  4 (4.00%) high severe

	single_mat4_mul_v       time:   [1.0665 ns 1.0690 ns 1.0722 ns]
	Found 10 outliers among 100 measurements (10.00%)
	  2 (2.00%) low mild
	  4 (4.00%) high mild
	  4 (4.00%) high severe

	mat2_tr_mul_v           time:   [719.95 ps 720.92 ps 722.16 ps]
							change: [+127.86% +128.34% +128.98%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 14 outliers among 100 measurements (14.00%)
	  1 (1.00%) low severe
	  2 (2.00%) low mild
	  7 (7.00%) high mild
	  4 (4.00%) high severe

	mat3_tr_mul_v           time:   [1.6551 ns 1.6564 ns 1.6577 ns]
							change: [+277.57% +278.32% +279.16%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  2 (2.00%) low severe
	  1 (1.00%) low mild
	  5 (5.00%) high mild
	  2 (2.00%) high severe

	mat4_tr_mul_v           time:   [2.6477 ns 2.6546 ns 2.6666 ns]
							change: [+259.47% +260.55% +261.67%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 9 outliers among 100 measurements (9.00%)
	  3 (3.00%) low severe
	  3 (3.00%) high mild
	  3 (3.00%) high severe

	single_mat2_tr_mul_v    time:   [353.60 ps 355.50 ps 358.48 ps]
	Found 10 outliers among 100 measurements (10.00%)
	  3 (3.00%) low mild
	  4 (4.00%) high mild
	  3 (3.00%) high severe

	single_mat3_tr_mul_v    time:   [778.13 ps 779.43 ps 781.25 ps]
	Found 10 outliers among 100 measurements (10.00%)
	  2 (2.00%) low severe
	  3 (3.00%) high mild
	  5 (5.00%) high severe

	single_mat4_tr_mul_v    time:   [1.1887 ns 1.1906 ns 1.1930 ns]
	Found 8 outliers among 100 measurements (8.00%)
	  3 (3.00%) low mild
	  2 (2.00%) high mild
	  3 (3.00%) high severe

	mat2_mul_s              time:   [774.44 ps 775.33 ps 776.37 ps]
							change: [+6.0947% +6.3308% +6.5936%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 12 outliers among 100 measurements (12.00%)
	  2 (2.00%) low severe
	  2 (2.00%) low mild
	  4 (4.00%) high mild
	  4 (4.00%) high severe

	mat3_mul_s              time:   [962.59 ps 964.98 ps 967.43 ps]
							change: [−38.097% −37.694% −37.145%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 10 outliers among 100 measurements (10.00%)
	  1 (1.00%) low severe
	  3 (3.00%) low mild
	  2 (2.00%) high mild
	  4 (4.00%) high severe

	mat4_mul_s              time:   [1.6589 ns 1.6640 ns 1.6684 ns]
							change: [−43.668% −43.130% −42.518%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 18 outliers among 100 measurements (18.00%)
	  8 (8.00%) low severe
	  3 (3.00%) low mild
	  1 (1.00%) high mild
	  6 (6.00%) high severe

	mat2_div_s              time:   [803.09 ps 804.70 ps 806.56 ps]
							change: [+10.272% +10.596% +10.960%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  3 (3.00%) low severe
	  1 (1.00%) low mild
	  3 (3.00%) high mild
	  3 (3.00%) high severe

	mat3_div_s              time:   [2.4929 ns 2.4947 ns 2.4967 ns]
							change: [+58.793% +59.185% +59.709%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 12 outliers among 100 measurements (12.00%)
	  3 (3.00%) low severe
	  5 (5.00%) high mild
	  4 (4.00%) high severe

	mat4_div_s              time:   [5.1650 ns 5.1688 ns 5.1735 ns]
							change: [+76.816% +77.215% +77.629%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 9 outliers among 100 measurements (9.00%)
	  2 (2.00%) low severe
	  1 (1.00%) low mild
	  4 (4.00%) high mild
	  2 (2.00%) high severe

	mat2_inv                time:   [1.1514 ns 1.1523 ns 1.1533 ns]
							change: [−41.682% −41.556% −41.439%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 11 outliers among 100 measurements (11.00%)
	  3 (3.00%) low severe
	  1 (1.00%) low mild
	  5 (5.00%) high mild
	  2 (2.00%) high severe

	mat3_inv                time:   [3.3641 ns 3.3707 ns 3.3826 ns]
							change: [−37.473% −37.358% −37.214%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 12 outliers among 100 measurements (12.00%)
	  1 (1.00%) low severe
	  1 (1.00%) low mild
	  5 (5.00%) high mild
	  5 (5.00%) high severe

	mat4_inv                time:   [25.970 ns 26.006 ns 26.062 ns]
							change: [−9.0865% −8.9013% −8.6986%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 14 outliers among 100 measurements (14.00%)
	  3 (3.00%) low severe
	  2 (2.00%) low mild
	  3 (3.00%) high mild
	  6 (6.00%) high severe

	mat2_transpose          time:   [409.94 ps 410.77 ps 411.75 ps]
							change: [−62.889% −62.624% −62.331%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 17 outliers among 100 measurements (17.00%)
	  4 (4.00%) low severe
	  2 (2.00%) low mild
	  4 (4.00%) high mild
	  7 (7.00%) high severe

	mat3_transpose          time:   [947.42 ps 953.20 ps 961.97 ps]
							change: [−61.273% −60.195% −58.616%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 11 outliers among 100 measurements (11.00%)
	  1 (1.00%) low mild
	  7 (7.00%) high mild
	  3 (3.00%) high severe

	mat4_transpose          time:   [1.6510 ns 1.6551 ns 1.6612 ns]
							change: [−65.877% −65.592% −65.225%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 13 outliers among 100 measurements (13.00%)
	  5 (5.00%) low severe
	  1 (1.00%) low mild
	  2 (2.00%) high mild
	  5 (5.00%) high severe

	mat_div_scalar          time:   [480.25 µs 480.55 µs 480.99 µs]
							change: [−22.235% −22.169% −22.095%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 6 outliers among 100 measurements (6.00%)
	  3 (3.00%) high mild
	  3 (3.00%) high severe

	mat100_add_mat100       time:   [3.0426 µs 3.0910 µs 3.1351 µs]
							change: [+81.145% +84.392% +88.112%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  2 (2.00%) low severe
	  3 (3.00%) low mild
	  7 (7.00%) high mild
	  1 (1.00%) high severe

	mat4_mul_mat4           time:   [36.836 ns 36.859 ns 36.886 ns]
							change: [+24.966% +25.568% +26.171%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  7 (7.00%) low severe
	  4 (4.00%) high mild
	  2 (2.00%) high severe

	mat5_mul_mat5           time:   [56.715 ns 56.876 ns 57.015 ns]
							change: [+10.239% +10.666% +11.091%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 8 outliers among 100 measurements (8.00%)
	  1 (1.00%) low severe
	  1 (1.00%) low mild
	  6 (6.00%) high mild

	mat6_mul_mat6           time:   [83.817 ns 83.999 ns 84.156 ns]
							change: [+10.675% +10.890% +11.065%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 1 outliers among 100 measurements (1.00%)
	  1 (1.00%) low mild

	mat7_mul_mat7           time:   [93.211 ns 93.386 ns 93.534 ns]
							change: [+10.654% +10.892% +11.129%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 3 outliers among 100 measurements (3.00%)
	  1 (1.00%) low severe
	  2 (2.00%) low mild

	mat8_mul_mat8           time:   [88.919 ns 89.410 ns 89.884 ns]
							change: [+22.808% +23.376% +23.888%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 2 outliers among 100 measurements (2.00%)
	  1 (1.00%) low mild
	  1 (1.00%) high mild

	mat9_mul_mat9           time:   [207.12 ns 209.04 ns 211.17 ns]
							change: [+14.053% +14.646% +15.258%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  9 (9.00%) low mild
	  1 (1.00%) high mild

	mat10_mul_mat10         time:   [236.75 ns 237.11 ns 237.47 ns]
							change: [+20.055% +20.366% +20.651%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  5 (5.00%) low severe
	  7 (7.00%) low mild
	  1 (1.00%) high mild

	mat10_mul_mat10_static  time:   [116.68 ns 117.15 ns 117.62 ns]
							change: [+11.160% +11.617% +12.049%] (p = 0.00 < 0.05)
							Performance has regressed.

	mat100_mul_mat100       time:   [40.188 µs 40.327 µs 40.459 µs]
							change: [+3.2490% +3.4765% +3.7130%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 15 outliers among 100 measurements (15.00%)
	  7 (7.00%) high mild
	  8 (8.00%) high severe

	mat500_mul_mat500       time:   [4.3909 ms 4.3944 ms 4.3978 ms]
							change: [+0.8556% +0.9519% +1.0448%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 9 outliers among 100 measurements (9.00%)
	  6 (6.00%) low severe
	  2 (2.00%) high mild
	  1 (1.00%) high severe

	iter                    time:   [840.01 µs 840.39 µs 840.81 µs]
							change: [+10.527% +10.726% +10.915%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  2 (2.00%) high mild
	  11 (11.00%) high severe

	iter_rev                time:   [210.14 µs 211.10 µs 212.84 µs]
							change: [+0.2455% +0.7119% +1.7846%] (p = 0.02 < 0.05)
							Change within noise threshold.
	Found 8 outliers among 100 measurements (8.00%)
	  2 (2.00%) high mild
	  6 (6.00%) high severe

	copy_from               time:   [199.77 µs 200.80 µs 202.55 µs]
							change: [+41.195% +41.962% +43.287%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 9 outliers among 100 measurements (9.00%)
	  8 (8.00%) low mild
	  1 (1.00%) high severe

	axpy                    time:   [31.301 µs 33.301 µs 34.957 µs]
							change: [+40.726% +52.001% +63.112%] (p = 0.00 < 0.05)
							Performance has regressed.

	tr_mul_to               time:   [126.46 µs 127.12 µs 128.09 µs]
							change: [−4.0124% −3.5145% −2.7708%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 2 outliers among 100 measurements (2.00%)
	  2 (2.00%) high severe

	mat_mul_mat             time:   [39.252 µs 39.443 µs 39.626 µs]
							change: [−0.7084% −0.3800% −0.0130%] (p = 0.02 < 0.05)
							Change within noise threshold.
	Found 11 outliers among 100 measurements (11.00%)
	  1 (1.00%) low mild
	  8 (8.00%) high mild
	  2 (2.00%) high severe

	mat100_from_fn          time:   [6.8398 µs 6.8418 µs 6.8446 µs]
							change: [+519.35% +522.43% +524.76%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  4 (4.00%) high mild
	  9 (9.00%) high severe

	mat500_from_fn          time:   [172.11 µs 172.14 µs 172.18 µs]
							change: [+498.70% +499.32% +499.93%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  1 (1.00%) low mild
	  5 (5.00%) high mild
	  7 (7.00%) high severe

	vec2_add_v_f32          time:   [303.98 ps 304.76 ps 305.65 ps]
							change: [−5.1499% −4.3536% −3.5996%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 15 outliers among 100 measurements (15.00%)
	  4 (4.00%) low severe
	  5 (5.00%) high mild
	  6 (6.00%) high severe

	vec3_add_v_f32          time:   [586.36 ps 587.93 ps 589.92 ps]
							change: [+34.275% +34.886% +35.631%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 12 outliers among 100 measurements (12.00%)
	  1 (1.00%) low mild
	  5 (5.00%) high mild
	  6 (6.00%) high severe

	vec4_add_v_f32          time:   [603.45 ps 604.44 ps 605.59 ps]
							change: [−18.949% −18.215% −17.623%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 14 outliers among 100 measurements (14.00%)
	  5 (5.00%) low severe
	  2 (2.00%) low mild
	  2 (2.00%) high mild
	  5 (5.00%) high severe

	vec2_add_v_f64          time:   [602.08 ps 602.83 ps 603.64 ps]
							change: [+89.139% +90.573% +91.808%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  4 (4.00%) low severe
	  1 (1.00%) low mild
	  3 (3.00%) high mild
	  5 (5.00%) high severe

	vec3_add_v_f64          time:   [910.94 ps 912.60 ps 914.56 ps]
							change: [+107.10% +108.18% +109.41%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 12 outliers among 100 measurements (12.00%)
	  3 (3.00%) low severe
	  6 (6.00%) high mild
	  3 (3.00%) high severe

	vec4_add_v_f64          time:   [1.1894 ns 1.1933 ns 1.1963 ns]
							change: [+82.607% +85.023% +86.911%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  9 (9.00%) low severe
	  2 (2.00%) low mild
	  2 (2.00%) high severe

	vec2_sub_v              time:   [303.45 ps 304.42 ps 305.37 ps]
							change: [−5.3598% −4.4578% −3.6738%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 15 outliers among 100 measurements (15.00%)
	  8 (8.00%) low severe
	  1 (1.00%) low mild
	  3 (3.00%) high mild
	  3 (3.00%) high severe

	vec3_sub_v              time:   [672.95 ps 674.82 ps 676.51 ps]
							change: [+51.463% +52.336% +53.346%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 4 outliers among 100 measurements (4.00%)
	  1 (1.00%) low mild
	  2 (2.00%) high mild
	  1 (1.00%) high severe

	vec4_sub_v              time:   [602.84 ps 604.65 ps 607.70 ps]
							change: [−19.744% −18.754% −17.881%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 13 outliers among 100 measurements (13.00%)
	  6 (6.00%) low severe
	  1 (1.00%) low mild
	  2 (2.00%) high mild
	  4 (4.00%) high severe

	vec2_mul_s              time:   [666.49 ps 667.29 ps 668.31 ps]
							change: [+111.37% +111.81% +112.32%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 16 outliers among 100 measurements (16.00%)
	  4 (4.00%) low severe
	  6 (6.00%) high mild
	  6 (6.00%) high severe

	vec3_mul_s              time:   [511.42 ps 513.44 ps 515.86 ps]
							change: [+15.556% +16.273% +17.049%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 6 outliers among 100 measurements (6.00%)
	  5 (5.00%) high mild
	  1 (1.00%) high severe

	vec4_mul_s              time:   [774.13 ps 775.22 ps 776.52 ps]
							change: [+5.1602% +5.5545% +6.0225%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  1 (1.00%) low severe
	  2 (2.00%) low mild
	  3 (3.00%) high mild
	  7 (7.00%) high severe

	vec2_div_s              time:   [1.3658 ns 1.3694 ns 1.3726 ns]
							change: [+328.67% +329.83% +331.09%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 1 outliers among 100 measurements (1.00%)
	  1 (1.00%) high severe

	vec3_div_s              time:   [607.73 ps 608.63 ps 609.66 ps]
							change: [+37.642% +38.017% +38.440%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 16 outliers among 100 measurements (16.00%)
	  2 (2.00%) low severe
	  8 (8.00%) high mild
	  6 (6.00%) high severe

	vec4_div_s              time:   [802.59 ps 803.62 ps 804.82 ps]
							change: [+8.9451% +9.3240% +9.7149%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 11 outliers among 100 measurements (11.00%)
	  3 (3.00%) low severe
	  6 (6.00%) high mild
	  2 (2.00%) high severe

	vec2_dot_f32            time:   [461.20 ps 461.73 ps 462.30 ps]
							change: [+117.88% +119.27% +120.79%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 16 outliers among 100 measurements (16.00%)
	  2 (2.00%) low severe
	  2 (2.00%) low mild
	  3 (3.00%) high mild
	  9 (9.00%) high severe

	vec3_dot_f32            time:   [688.24 ps 689.05 ps 689.95 ps]
							change: [+225.49% +227.19% +229.16%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  1 (1.00%) low mild
	  4 (4.00%) high mild
	  5 (5.00%) high severe

	vec4_dot_f32            time:   [917.20 ps 921.23 ps 928.57 ps]
							change: [+338.59% +341.30% +344.17%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  8 (8.00%) high mild
	  5 (5.00%) high severe

	vec2_dot_f64            time:   [596.11 ps 597.51 ps 598.79 ps]
							change: [+177.79% +179.60% +182.13%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 3 outliers among 100 measurements (3.00%)
	  2 (2.00%) high mild
	  1 (1.00%) high severe

	vec3_dot_f64            time:   [749.32 ps 751.02 ps 752.81 ps]
							change: [+253.48% +257.12% +262.11%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  3 (3.00%) high mild
	  7 (7.00%) high severe

	vec4_dot_f64            time:   [1.0145 ns 1.0185 ns 1.0230 ns]
							change: [+376.34% +379.47% +383.46%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 5 outliers among 100 measurements (5.00%)
	  3 (3.00%) high mild
	  2 (2.00%) high severe

	vec3_cross              time:   [971.01 ps 971.87 ps 972.73 ps]
							change: [+122.34% +122.74% +123.17%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  2 (2.00%) low severe
	  1 (1.00%) low mild
	  3 (3.00%) high mild
	  4 (4.00%) high severe

	vec2_norm               time:   [1.0612 ns 1.0623 ns 1.0637 ns]
							change: [−0.0722% +0.0499% +0.1765%] (p = 0.44 > 0.05)
							No change in performance detected.
	Found 6 outliers among 100 measurements (6.00%)
	  4 (4.00%) low mild
	  2 (2.00%) high severe

	vec3_norm               time:   [1.0649 ns 1.0665 ns 1.0694 ns]
							change: [−4.3787% −4.1856% −3.8679%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 4 outliers among 100 measurements (4.00%)
	  2 (2.00%) high mild
	  2 (2.00%) high severe

	vec4_norm               time:   [1.0733 ns 1.0739 ns 1.0746 ns]
							change: [−4.5616% −3.9738% −2.9157%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 19 outliers among 100 measurements (19.00%)
	  2 (2.00%) low severe
	  7 (7.00%) low mild
	  5 (5.00%) high mild
	  5 (5.00%) high severe

	vec2_normalize          time:   [2.5310 ns 2.5326 ns 2.5345 ns]
							change: [+3.5769% +3.6696% +3.7678%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 2 outliers among 100 measurements (2.00%)
	  1 (1.00%) high mild
	  1 (1.00%) high severe

	vec3_normalize          time:   [2.5389 ns 2.5409 ns 2.5424 ns]
							change: [+1.1411% +1.2860% +1.4910%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 2 outliers among 100 measurements (2.00%)
	  1 (1.00%) high mild
	  1 (1.00%) high severe

	vec4_normalize          time:   [1.8154 ns 1.8164 ns 1.8173 ns]
							change: [−1.1191% −0.9926% −0.8485%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 8 outliers among 100 measurements (8.00%)
	  3 (3.00%) low severe
	  1 (1.00%) low mild
	  1 (1.00%) high mild
	  3 (3.00%) high severe

	vec10000_dot_f64        time:   [2.0296 µs 2.0337 µs 2.0383 µs]
							change: [+71.107% +72.619% +74.228%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 11 outliers among 100 measurements (11.00%)
	  4 (4.00%) low severe
	  3 (3.00%) high mild
	  4 (4.00%) high severe

	vec10000_dot_f32        time:   [1.1891 µs 1.1926 µs 1.1962 µs]
							change: [+6.3585% +7.1059% +7.9357%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 12 outliers among 100 measurements (12.00%)
	  1 (1.00%) low severe
	  1 (1.00%) low mild
	  4 (4.00%) high mild
	  6 (6.00%) high severe

	vec10000_axpy_f64       time:   [2.0702 µs 2.0739 µs 2.0777 µs]
							change: [+39.373% +40.227% +41.210%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  3 (3.00%) low severe
	  1 (1.00%) low mild
	  4 (4.00%) high mild
	  2 (2.00%) high severe

	vec10000_axpy_beta_f64  time:   [2.0914 µs 2.0962 µs 2.1012 µs]
							change: [+31.958% +32.843% +33.467%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 11 outliers among 100 measurements (11.00%)
	  4 (4.00%) low severe
	  5 (5.00%) high mild
	  2 (2.00%) high severe

	vec10000_axpy_f64_slice time:   [2.0272 µs 2.0303 µs 2.0335 µs]
							change: [+35.880% +36.621% +37.307%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 6 outliers among 100 measurements (6.00%)
	  3 (3.00%) low severe
	  2 (2.00%) high mild
	  1 (1.00%) high severe

	vec10000_axpy_f64_static
							time:   [13.917 µs 13.965 µs 14.005 µs]
							change: [+859.61% +869.73% +879.35%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 6 outliers among 100 measurements (6.00%)
	  1 (1.00%) low severe
	  3 (3.00%) high mild
	  2 (2.00%) high severe

	vec10000_axpy_f32       time:   [1.0402 µs 1.0421 µs 1.0437 µs]
							change: [+38.710% +39.603% +40.363%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 9 outliers among 100 measurements (9.00%)
	  5 (5.00%) low severe
	  1 (1.00%) low mild
	  2 (2.00%) high mild
	  1 (1.00%) high severe

	vec10000_axpy_beta_f32  time:   [1.0329 µs 1.0346 µs 1.0364 µs]
							change: [+30.705% +31.490% +32.040%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 8 outliers among 100 measurements (8.00%)
	  4 (4.00%) low severe
	  1 (1.00%) low mild
	  2 (2.00%) high mild
	  1 (1.00%) high severe

	quaternion_add_q        time:   [642.58 ps 650.39 ps 662.45 ps]
							change: [−11.788% −10.934% −9.9463%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 14 outliers among 100 measurements (14.00%)
	  2 (2.00%) low severe
	  2 (2.00%) low mild
	  4 (4.00%) high mild
	  6 (6.00%) high severe

	quaternion_sub_q        time:   [641.16 ps 643.22 ps 645.88 ps]
							change: [−12.654% −11.822% −10.943%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 15 outliers among 100 measurements (15.00%)
	  5 (5.00%) low severe
	  1 (1.00%) low mild
	  5 (5.00%) high mild
	  4 (4.00%) high severe

	quaternion_mul_q        time:   [1.4252 ns 1.4271 ns 1.4294 ns]
							change: [+94.545% +95.022% +95.499%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 12 outliers among 100 measurements (12.00%)
	  1 (1.00%) low severe
	  2 (2.00%) low mild
	  4 (4.00%) high mild
	  5 (5.00%) high severe

	unit_quaternion_mul_v   time:   [1.4859 ns 1.4874 ns 1.4890 ns]
							change: [+242.77% +243.56% +244.31%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 3 outliers among 100 measurements (3.00%)
	  3 (3.00%) high mild

	single_unit_quaternion_mul_v
							time:   [1.0422 ns 1.0457 ns 1.0504 ns]
	Found 9 outliers among 100 measurements (9.00%)
	  1 (1.00%) low severe
	  4 (4.00%) high mild
	  4 (4.00%) high severe

	quaternion_mul_s        time:   [771.17 ps 772.18 ps 773.37 ps]
							change: [+6.1278% +6.4276% +6.7583%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 9 outliers among 100 measurements (9.00%)
	  3 (3.00%) low mild
	  3 (3.00%) high mild
	  3 (3.00%) high severe

	quaternion_div_s        time:   [798.54 ps 799.82 ps 801.43 ps]
							change: [+9.2123% +9.7287% +10.338%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  2 (2.00%) low severe
	  2 (2.00%) low mild
	  4 (4.00%) high mild
	  5 (5.00%) high severe

	quaternion_inv          time:   [1.2401 ns 1.2408 ns 1.2417 ns]
							change: [−43.660% −43.521% −43.317%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 13 outliers among 100 measurements (13.00%)
	  2 (2.00%) low severe
	  5 (5.00%) high mild
	  6 (6.00%) high severe

	unit_quaternion_inv     time:   [596.01 ps 598.93 ps 602.66 ps]
							change: [−49.707% −49.184% −48.445%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 15 outliers among 100 measurements (15.00%)
	  6 (6.00%) high mild
	  9 (9.00%) high severe

	quaternion_conjugate    time:   [604.36 ps 608.60 ps 613.48 ps]
	Found 12 outliers among 100 measurements (12.00%)
	  3 (3.00%) high mild
	  9 (9.00%) high severe

	quaternion_normalize    time:   [1.8268 ns 1.8274 ns 1.8281 ns]
	Found 18 outliers among 100 measurements (18.00%)
	  4 (4.00%) low severe
	  4 (4.00%) low mild
	  7 (7.00%) high mild
	  3 (3.00%) high severe

	bidiagonalize_100x100   time:   [265.91 µs 266.00 µs 266.11 µs]
							change: [+0.7553% +0.8363% +0.9114%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 8 outliers among 100 measurements (8.00%)
	  5 (5.00%) high mild
	  3 (3.00%) high severe

	bidiagonalize_100x500   time:   [2.0053 ms 2.0060 ms 2.0065 ms]
							change: [+4.0325% +4.2372% +4.3938%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 12 outliers among 100 measurements (12.00%)
	  5 (5.00%) low severe
	  2 (2.00%) high mild
	  5 (5.00%) high severe

	bidiagonalize_4x4       time:   [266.92 ns 267.24 ns 267.62 ns]
							change: [+7.1063% +7.2057% +7.3231%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 23 outliers among 100 measurements (23.00%)
	  1 (1.00%) low severe
	  5 (5.00%) low mild
	  13 (13.00%) high mild
	  4 (4.00%) high severe

	Benchmarking bidiagonalize_500x100: Warming up for 3.0000 s
	Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
	bidiagonalize_500x100   time:   [1.6781 ms 1.6793 ms 1.6804 ms]
							change: [+1.3944% +1.5312% +1.6400%] (p = 0.00 < 0.05)
							Performance has regressed.

	bidiagonalize_unpack_100x100
							time:   [522.13 µs 522.36 µs 522.63 µs]
							change: [−0.5318% −0.4044% −0.2627%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 12 outliers among 100 measurements (12.00%)
	  1 (1.00%) low mild
	  4 (4.00%) high mild
	  7 (7.00%) high severe

	bidiagonalize_unpack_100x500
							time:   [2.9858 ms 2.9916 ms 2.9976 ms]
							change: [−0.7824% −0.3995% −0.0370%] (p = 0.04 < 0.05)
							Change within noise threshold.

	bidiagonalize_unpack_500x100
							time:   [2.5884 ms 2.5896 ms 2.5910 ms]
							change: [+0.0767% +0.1539% +0.2316%] (p = 0.00 < 0.05)
							Change within noise threshold.

	cholesky_100x100        time:   [31.084 µs 31.101 µs 31.122 µs]
							change: [−5.0365% −4.7949% −4.4205%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 16 outliers among 100 measurements (16.00%)
	  2 (2.00%) low severe
	  4 (4.00%) low mild
	  1 (1.00%) high mild
	  9 (9.00%) high severe

	cholesky_500x500        time:   [4.4799 ms 4.4849 ms 4.4903 ms]
							change: [−0.5985% −0.3685% −0.1374%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 3 outliers among 100 measurements (3.00%)
	  2 (2.00%) high mild
	  1 (1.00%) high severe

	cholesky_decompose_unpack_100x100
							time:   [31.659 µs 31.685 µs 31.727 µs]
							change: [−4.9712% −4.7445% −4.3325%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 15 outliers among 100 measurements (15.00%)
	  4 (4.00%) low severe
	  4 (4.00%) low mild
	  2 (2.00%) high mild
	  5 (5.00%) high severe

	cholesky_decompose_unpack_500x500
							time:   [4.4795 ms 4.4845 ms 4.4910 ms]
							change: [−1.9595% −1.7121% −1.4978%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 14 outliers among 100 measurements (14.00%)
	  3 (3.00%) low severe
	  1 (1.00%) low mild
	  3 (3.00%) high mild
	  7 (7.00%) high severe

	cholesky_solve_10x10    time:   [170.70 ns 170.76 ns 170.82 ns]
							change: [+8.0936% +8.1777% +8.2764%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  3 (3.00%) low mild
	  5 (5.00%) high mild
	  2 (2.00%) high severe

	cholesky_solve_100x100  time:   [2.9071 µs 2.9117 µs 2.9174 µs]
							change: [+8.4770% +8.9956% +9.6254%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 7 outliers among 100 measurements (7.00%)
	  1 (1.00%) low severe
	  3 (3.00%) low mild
	  2 (2.00%) high mild
	  1 (1.00%) high severe

	cholesky_solve_500x500  time:   [54.193 µs 54.303 µs 54.417 µs]
							change: [+3.9332% +4.1755% +4.4477%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 1 outliers among 100 measurements (1.00%)
	  1 (1.00%) high mild

	cholesky_inverse_10x10  time:   [1.3189 µs 1.3195 µs 1.3201 µs]
							change: [+2.5360% +2.6238% +2.7131%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 7 outliers among 100 measurements (7.00%)
	  2 (2.00%) high mild
	  5 (5.00%) high severe

	cholesky_inverse_100x100
							time:   [270.85 µs 270.88 µs 270.92 µs]
							change: [−0.9726% −0.8524% −0.7319%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 9 outliers among 100 measurements (9.00%)
	  1 (1.00%) low severe
	  4 (4.00%) low mild
	  2 (2.00%) high mild
	  2 (2.00%) high severe

	cholesky_inverse_500x500
							time:   [26.673 ms 26.694 ms 26.714 ms]
							change: [+1.0784% +1.1816% +1.2794%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 23 outliers among 100 measurements (23.00%)
	  19 (19.00%) low severe
	  2 (2.00%) low mild
	  2 (2.00%) high severe

	full_piv_lu_decompose_10x10
							time:   [582.31 ns 582.48 ns 582.67 ns]
							change: [+19.583% +19.702% +19.795%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 10 outliers among 100 measurements (10.00%)
	  2 (2.00%) low severe
	  6 (6.00%) high mild
	  2 (2.00%) high severe

	full_piv_lu_decompose_100x100
							time:   [218.73 µs 218.78 µs 218.84 µs]
							change: [+5.8729% +5.9828% +6.0904%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 8 outliers among 100 measurements (8.00%)
	  2 (2.00%) low severe
	  5 (5.00%) low mild
	  1 (1.00%) high severe

	full_piv_lu_solve_10x10 time:   [124.88 ns 124.94 ns 125.02 ns]
							change: [+7.4724% +7.6252% +7.7787%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  3 (3.00%) low severe
	  6 (6.00%) high mild
	  4 (4.00%) high severe

	full_piv_lu_solve_100x100
							time:   [2.5202 µs 2.5244 µs 2.5289 µs]
							change: [+11.226% +11.847% +12.518%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 17 outliers among 100 measurements (17.00%)
	  14 (14.00%) low severe
	  2 (2.00%) low mild
	  1 (1.00%) high mild

	full_piv_lu_inverse_10x10
							time:   [869.61 ns 870.27 ns 871.19 ns]
							change: [+4.7996% +4.9224% +5.0608%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 7 outliers among 100 measurements (7.00%)
	  2 (2.00%) low severe
	  1 (1.00%) high mild
	  4 (4.00%) high severe

	full_piv_lu_inverse_100x100
							time:   [212.68 µs 212.83 µs 213.05 µs]
							change: [−0.2835% −0.0351% +0.1310%] (p = 0.80 > 0.05)
							No change in performance detected.
	Found 13 outliers among 100 measurements (13.00%)
	  1 (1.00%) low severe
	  4 (4.00%) low mild
	  3 (3.00%) high mild
	  5 (5.00%) high severe

	full_piv_lu_determinant_10x10
							time:   [15.320 ns 15.338 ns 15.357 ns]
							change: [+410.70% +421.41% +430.41%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  9 (9.00%) low severe
	  1 (1.00%) low mild
	  3 (3.00%) high mild

	full_piv_lu_determinant_100x100
							time:   [137.44 ns 139.37 ns 141.00 ns]
							change: [+213.54% +227.75% +241.42%] (p = 0.00 < 0.05)
							Performance has regressed.

	hessenberg_decompose_4x4
							time:   [82.510 ns 82.538 ns 82.564 ns]
							change: [−27.950% −27.887% −27.830%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 1 outliers among 100 measurements (1.00%)
	  1 (1.00%) high mild

	hessenberg_decompose_100x100
							time:   [295.98 µs 296.16 µs 296.44 µs]
							change: [+3.3234% +3.5705% +3.7986%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 8 outliers among 100 measurements (8.00%)
	  2 (2.00%) low mild
	  2 (2.00%) high mild
	  4 (4.00%) high severe

	hessenberg_decompose_200x200
							time:   [2.2647 ms 2.2681 ms 2.2714 ms]
							change: [+4.8426% +4.9983% +5.1646%] (p = 0.00 < 0.05)
							Performance has regressed.

	hessenberg_decompose_unpack_100x100
							time:   [435.30 µs 435.75 µs 436.12 µs]
							change: [+2.7479% +2.8420% +2.9424%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 1 outliers among 100 measurements (1.00%)
	  1 (1.00%) high severe

	hessenberg_decompose_unpack_200x200
							time:   [3.2667 ms 3.2678 ms 3.2690 ms]
							change: [+3.9624% +4.0021% +4.0423%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 22 outliers among 100 measurements (22.00%)
	  13 (13.00%) low severe
	  1 (1.00%) low mild
	  3 (3.00%) high mild
	  5 (5.00%) high severe

	lu_decompose_10x10      time:   [353.04 ns 353.16 ns 353.31 ns]
							change: [−5.0408% −4.9435% −4.8487%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 19 outliers among 100 measurements (19.00%)
	  4 (4.00%) low severe
	  4 (4.00%) low mild
	  6 (6.00%) high mild
	  5 (5.00%) high severe

	lu_decompose_100x100    time:   [71.544 µs 71.560 µs 71.579 µs]
							change: [−1.7176% −1.6430% −1.5721%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 9 outliers among 100 measurements (9.00%)
	  2 (2.00%) low severe
	  2 (2.00%) low mild
	  2 (2.00%) high mild
	  3 (3.00%) high severe

	lu_solve_10x10          time:   [115.42 ns 115.52 ns 115.61 ns]
							change: [+3.9363% +4.1024% +4.2557%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 15 outliers among 100 measurements (15.00%)
	  4 (4.00%) low severe
	  8 (8.00%) low mild
	  2 (2.00%) high mild
	  1 (1.00%) high severe

	lu_solve_100x100        time:   [2.5152 µs 2.5190 µs 2.5225 µs]
							change: [+15.120% +15.625% +16.088%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 7 outliers among 100 measurements (7.00%)
	  4 (4.00%) low severe
	  2 (2.00%) low mild
	  1 (1.00%) high mild

	lu_inverse_10x10        time:   [902.55 ns 903.32 ns 903.97 ns]
							change: [+0.7407% +0.8734% +1.0263%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 2 outliers among 100 measurements (2.00%)
	  1 (1.00%) low mild
	  1 (1.00%) high severe

	lu_inverse_100x100      time:   [216.21 µs 216.47 µs 216.80 µs]
							change: [−0.6663% −0.5584% −0.4316%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 18 outliers among 100 measurements (18.00%)
	  2 (2.00%) low severe
	  4 (4.00%) low mild
	  5 (5.00%) high mild
	  7 (7.00%) high severe

	lu_determinant_10x10    time:   [13.394 ns 13.481 ns 13.665 ns]
							change: [+508.98% +524.96% +543.53%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 14 outliers among 100 measurements (14.00%)
	  6 (6.00%) low severe
	  1 (1.00%) low mild
	  5 (5.00%) high mild
	  2 (2.00%) high severe

	lu_determinant_100x100  time:   [149.12 ns 150.16 ns 151.08 ns]
							change: [+265.69% +281.86% +296.23%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 14 outliers among 100 measurements (14.00%)
	  10 (10.00%) low severe
	  4 (4.00%) low mild

	qr_decompose_100x100    time:   [141.62 µs 141.65 µs 141.69 µs]
							change: [+0.6391% +0.8447% +0.9784%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 9 outliers among 100 measurements (9.00%)
	  5 (5.00%) low mild
	  1 (1.00%) high mild
	  3 (3.00%) high severe

	Benchmarking qr_decompose_100x500: Warming up for 3.0000 s
	Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
	qr_decompose_100x500    time:   [1.0071 ms 1.0082 ms 1.0097 ms]
							change: [+0.9031% +1.2358% +1.6126%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 16 outliers among 100 measurements (16.00%)
	  12 (12.00%) low mild
	  2 (2.00%) high mild
	  2 (2.00%) high severe

	qr_decompose_4x4        time:   [100.40 ns 100.43 ns 100.45 ns]
							change: [−19.315% −19.268% −19.224%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 7 outliers among 100 measurements (7.00%)
	  2 (2.00%) low mild
	  1 (1.00%) high mild
	  4 (4.00%) high severe

	Benchmarking qr_decompose_500x100: Warming up for 3.0000 s
	Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60.
	qr_decompose_500x100    time:   [847.17 µs 847.68 µs 848.21 µs]
							change: [+2.1441% +2.3425% +2.5069%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 4 outliers among 100 measurements (4.00%)
	  1 (1.00%) high mild
	  3 (3.00%) high severe

	qr_decompose_unpack_100x100
							time:   [283.22 µs 283.26 µs 283.30 µs]
							change: [−0.3591% −0.2383% −0.1147%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 23 outliers among 100 measurements (23.00%)
	  21 (21.00%) low severe
	  1 (1.00%) low mild
	  1 (1.00%) high severe

	Benchmarking qr_decompose_unpack_100x500: Warming up for 3.0000 s
	Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 60.
	qr_decompose_unpack_100x500
							time:   [1.1399 ms 1.1429 ms 1.1457 ms]
							change: [−1.9555% −1.8085% −1.6312%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 1 outliers among 100 measurements (1.00%)
	  1 (1.00%) high mild

	Benchmarking qr_decompose_unpack_500x100: Warming up for 3.0000 s
	Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.6s, enable flat sampling, or reduce sample count to 50.
	qr_decompose_unpack_500x100
							time:   [1.6633 ms 1.6640 ms 1.6648 ms]
							change: [+1.4516% +1.5245% +1.5969%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 11 outliers among 100 measurements (11.00%)
	  2 (2.00%) low severe
	  5 (5.00%) low mild
	  4 (4.00%) high severe

	qr_solve_10x10          time:   [156.51 ns 156.56 ns 156.61 ns]
							change: [+3.7415% +3.8709% +3.9947%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 12 outliers among 100 measurements (12.00%)
	  6 (6.00%) low severe
	  5 (5.00%) low mild
	  1 (1.00%) high mild

	qr_solve_100x100        time:   [3.5393 µs 3.5454 µs 3.5511 µs]
							change: [+6.0908% +6.5747% +6.9798%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 6 outliers among 100 measurements (6.00%)
	  6 (6.00%) low mild

	qr_inverse_10x10        time:   [806.75 ns 807.99 ns 809.61 ns]
							change: [+0.6973% +0.8242% +0.9558%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 1 outliers among 100 measurements (1.00%)
	  1 (1.00%) high severe

	qr_inverse_100x100      time:   [330.65 µs 330.74 µs 330.85 µs]
							change: [+1.2238% +1.3244% +1.4518%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 12 outliers among 100 measurements (12.00%)
	  3 (3.00%) low mild
	  4 (4.00%) high mild
	  5 (5.00%) high severe

	schur_decompose_4x4     time:   [969.14 ns 969.71 ns 970.18 ns]
							change: [−12.293% −12.223% −12.149%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 10 outliers among 100 measurements (10.00%)
	  3 (3.00%) low severe
	  1 (1.00%) low mild
	  2 (2.00%) high mild
	  4 (4.00%) high severe

	schur_decompose_10x10   time:   [7.3226 µs 7.3237 µs 7.3247 µs]
							change: [+0.3785% +0.4095% +0.4394%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 9 outliers among 100 measurements (9.00%)
	  2 (2.00%) low mild
	  4 (4.00%) high mild
	  3 (3.00%) high severe

	schur_decompose_100x100 time:   [2.5760 ms 2.5763 ms 2.5768 ms]
							change: [+0.7992% +0.8504% +0.8935%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 4 outliers among 100 measurements (4.00%)
	  3 (3.00%) high mild
	  1 (1.00%) high severe

	schur_decompose_200x200 time:   [18.285 ms 18.296 ms 18.308 ms]
							change: [+1.9360% +2.0941% +2.2427%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 6 outliers among 100 measurements (6.00%)
	  1 (1.00%) low mild
	  3 (3.00%) high mild
	  2 (2.00%) high severe

	eigenvalues_4x4         time:   [937.94 ns 938.15 ns 938.38 ns]
							change: [+25.764% +25.898% +26.023%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 6 outliers among 100 measurements (6.00%)
	  2 (2.00%) low severe
	  2 (2.00%) low mild
	  2 (2.00%) high mild

	eigenvalues_10x10       time:   [5.9066 µs 5.9088 µs 5.9117 µs]
							change: [+0.1208% +0.1938% +0.2740%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 8 outliers among 100 measurements (8.00%)
	  1 (1.00%) low mild
	  3 (3.00%) high mild
	  4 (4.00%) high severe

	Benchmarking eigenvalues_100x100: Warming up for 3.0000 s
	Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50.
	eigenvalues_100x100     time:   [1.5870 ms 1.5873 ms 1.5876 ms]
							change: [−0.8569% −0.8247% −0.7914%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 5 outliers among 100 measurements (5.00%)
	  3 (3.00%) high mild
	  2 (2.00%) high severe

	eigenvalues_200x200     time:   [11.081 ms 11.088 ms 11.102 ms]
							change: [+0.0054% +0.2956% +0.4946%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 4 outliers among 100 measurements (4.00%)
	  1 (1.00%) low mild
	  1 (1.00%) high mild
	  2 (2.00%) high severe

	solve_l_triangular_100x100
							time:   [1.3250 µs 1.3651 µs 1.4012 µs]
							change: [+22.932% +24.999% +27.087%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 12 outliers among 100 measurements (12.00%)
	  10 (10.00%) high mild
	  2 (2.00%) high severe

	solve_l_triangular_1000x1000
							time:   [101.52 µs 102.04 µs 102.85 µs]
							change: [+1.5784% +2.0953% +2.8471%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 15 outliers among 100 measurements (15.00%)
	  9 (9.00%) high mild
	  6 (6.00%) high severe

	tr_solve_l_triangular_100x100
							time:   [2.0144 µs 2.0537 µs 2.0902 µs]
							change: [+13.600% +14.669% +15.998%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 16 outliers among 100 measurements (16.00%)
	  5 (5.00%) high mild
	  11 (11.00%) high severe

	tr_solve_l_triangular_1000x1000
							time:   [93.569 µs 94.056 µs 94.857 µs]
							change: [+1.2474% +1.7955% +2.5979%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 7 outliers among 100 measurements (7.00%)
	  3 (3.00%) high mild
	  4 (4.00%) high severe

	solve_u_triangular_100x100
							time:   [1.5878 µs 1.6615 µs 1.7405 µs]
							change: [+31.200% +34.370% +38.132%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  10 (10.00%) high mild
	  3 (3.00%) high severe

	solve_u_triangular_1000x1000
							time:   [105.07 µs 105.46 µs 106.12 µs]
							change: [+6.6559% +7.0936% +7.8401%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 2 outliers among 100 measurements (2.00%)
	  2 (2.00%) high severe

	tr_solve_u_triangular_100x100
							time:   [1.4369 µs 1.4697 µs 1.4986 µs]
							change: [+17.195% +18.687% +20.307%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 13 outliers among 100 measurements (13.00%)
	  11 (11.00%) high mild
	  2 (2.00%) high severe

	tr_solve_u_triangular_1000x1000
							time:   [88.868 µs 89.303 µs 90.014 µs]
							change: [+4.2489% +4.7933% +5.6045%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 11 outliers among 100 measurements (11.00%)
	  4 (4.00%) high mild
	  7 (7.00%) high severe

	svd_decompose_2x2       time:   [22.913 ns 22.958 ns 23.017 ns]
							change: [+9.3648% +9.7443% +10.253%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 7 outliers among 100 measurements (7.00%)
	  2 (2.00%) high mild
	  5 (5.00%) high severe

	svd_decompose_3x3       time:   [359.30 ns 359.72 ns 360.20 ns]
							change: [+9.0123% +9.1174% +9.2394%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 1 outliers among 100 measurements (1.00%)
	  1 (1.00%) high mild

	svd_decompose_4x4       time:   [896.28 ns 896.55 ns 896.85 ns]
							change: [−7.1192% −7.0496% −6.9853%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 10 outliers among 100 measurements (10.00%)
	  2 (2.00%) low severe
	  3 (3.00%) low mild
	  3 (3.00%) high mild
	  2 (2.00%) high severe

	svd_decompose_10x10     time:   [5.7680 µs 5.7708 µs 5.7739 µs]
							change: [+1.1933% +1.4155% +1.6347%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 3 outliers among 100 measurements (3.00%)
	  1 (1.00%) high mild
	  2 (2.00%) high severe

	Benchmarking svd_decompose_100x100: Warming up for 3.0000 s
	Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.2s, enable flat sampling, or reduce sample count to 50.
	svd_decompose_100x100   time:   [1.5704 ms 1.5709 ms 1.5715 ms]
							change: [+1.4465% +1.4891% +1.5357%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 3 outliers among 100 measurements (3.00%)
	  2 (2.00%) high mild
	  1 (1.00%) high severe

	svd_decompose_200x200   time:   [11.845 ms 11.847 ms 11.850 ms]
							change: [+1.4378% +1.4794% +1.5225%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 4 outliers among 100 measurements (4.00%)
	  4 (4.00%) high severe

	rank_4x4                time:   [716.49 ns 716.62 ns 716.74 ns]
							change: [+4.9084% +4.9678% +5.0237%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 1 outliers among 100 measurements (1.00%)
	  1 (1.00%) low mild

	rank_10x10              time:   [4.2304 µs 4.2341 µs 4.2377 µs]
							change: [+0.4993% +0.6056% +0.7271%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 1 outliers among 100 measurements (1.00%)
	  1 (1.00%) high mild

	rank_100x100            time:   [522.74 µs 522.85 µs 522.97 µs]
							change: [+0.2822% +0.3170% +0.3535%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 3 outliers among 100 measurements (3.00%)
	  1 (1.00%) low mild
	  2 (2.00%) high severe

	rank_200x200            time:   [3.0167 ms 3.0217 ms 3.0267 ms]
							change: [+0.3924% +0.5333% +0.6946%] (p = 0.00 < 0.05)
							Change within noise threshold.

	singular_values_4x4     time:   [735.97 ns 736.08 ns 736.21 ns]
							change: [−7.6736% −7.6163% −7.5596%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 5 outliers among 100 measurements (5.00%)
	  1 (1.00%) low severe
	  2 (2.00%) low mild
	  2 (2.00%) high severe

	singular_values_10x10   time:   [4.2987 µs 4.2997 µs 4.3010 µs]
							change: [+1.6193% +1.7215% +1.8186%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 8 outliers among 100 measurements (8.00%)
	  4 (4.00%) high mild
	  4 (4.00%) high severe

	singular_values_100x100 time:   [525.20 µs 525.36 µs 525.54 µs]
							change: [+0.4054% +0.4526% +0.4982%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 9 outliers among 100 measurements (9.00%)
	  6 (6.00%) low mild
	  1 (1.00%) high mild
	  2 (2.00%) high severe

	singular_values_200x200 time:   [3.0712 ms 3.0729 ms 3.0750 ms]
							change: [+2.1769% +2.2358% +2.3112%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 3 outliers among 100 measurements (3.00%)
	  1 (1.00%) low mild
	  1 (1.00%) high mild
	  1 (1.00%) high severe

	pseudo_inverse_4x4      time:   [877.64 ns 878.38 ns 879.12 ns]
							change: [−8.2828% −8.2216% −8.1662%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 13 outliers among 100 measurements (13.00%)
	  1 (1.00%) low severe
	  3 (3.00%) low mild
	  2 (2.00%) high mild
	  7 (7.00%) high severe

	pseudo_inverse_10x10    time:   [6.0008 µs 6.0034 µs 6.0064 µs]
							change: [+0.2665% +0.3678% +0.4766%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 8 outliers among 100 measurements (8.00%)
	  4 (4.00%) high mild
	  4 (4.00%) high severe

	Benchmarking pseudo_inverse_100x100: Warming up for 3.0000 s
	Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50.
	pseudo_inverse_100x100  time:   [1.6088 ms 1.6091 ms 1.6094 ms]
							change: [+0.1161% +0.2007% +0.2937%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 12 outliers among 100 measurements (12.00%)
	  2 (2.00%) high mild
	  10 (10.00%) high severe

	pseudo_inverse_200x200  time:   [12.038 ms 12.042 ms 12.047 ms]
							change: [−0.4351% −0.2531% −0.0699%] (p = 0.01 < 0.05)
							Change within noise threshold.
	Found 22 outliers among 100 measurements (22.00%)
	  16 (16.00%) low severe
	  2 (2.00%) low mild
	  1 (1.00%) high mild
	  3 (3.00%) high severe

	symmetric_eigen_decompose_4x4
							time:   [518.00 ns 518.07 ns 518.15 ns]
							change: [+4.7008% +4.7492% +4.8006%] (p = 0.00 < 0.05)
							Performance has regressed.
	Found 8 outliers among 100 measurements (8.00%)
	  2 (2.00%) low mild
	  2 (2.00%) high mild
	  4 (4.00%) high severe

	symmetric_eigen_decompose_10x10
							time:   [3.6417 µs 3.6428 µs 3.6440 µs]
							change: [−0.1549% −0.0998% −0.0483%] (p = 0.00 < 0.05)
							Change within noise threshold.
	Found 12 outliers among 100 measurements (12.00%)
	  6 (6.00%) high mild
	  6 (6.00%) high severe

	symmetric_eigen_decompose_100x100
							time:   [761.64 µs 762.66 µs 763.80 µs]
							change: [−5.8109% −5.7178% −5.6284%] (p = 0.00 < 0.05)
							Performance has improved.
	Found 19 outliers among 100 measurements (19.00%)
	  9 (9.00%) low severe
	  9 (9.00%) low mild
	  1 (1.00%) high severe

	symmetric_eigen_decompose_200x200
							time:   [5.1304 ms 5.1337 ms 5.1372 ms]
							change: [−9.4434% −9.3646% −9.2959%] (p = 0.00 < 0.05)
							Performance has improved.

Total run time of full benchmark suite on my machine (AMD 5950X) has
not changed and is still around ~30 minutes.
Some algorithms may not converge when used on completely random values
with the default value of epsilon and unlimited iterations.

`reproducible_dmatrix()` already exist to circumvent this for `DMatrix`,
so I implemented the same for `SMatrix`.

In my tests this problem manifested itself only on
`schur_decompose_4x4`, but I decided to apply similar fix for all
benchmarks that also use `reproducible_dmatrix()` for `DMatrix`.
Random matrices may be not positive-definite and Cholesky decomposition
benchmarks panic because of that:

	Benchmarking cholesky_decompose_unpack_100x100: Warming up for 3.0000 s
	thread 'main' panicked at benches/linalg/cholesky.rs:38:45:
	called `Option::unwrap()` on a `None` value
@im-0 im-0 requested a review from geo-ant September 30, 2025 00:54
@geo-ant
Copy link
Collaborator

geo-ant commented Sep 30, 2025

Hey @im-0, sorry GitHub mobile is not letting me provide an actual review. Thanks for the effort, that's very exhaustive benchmarking now. The one thing I stumbled over is that sometimes matrices with constant values are generated ('from_slice', 'from_element') rather than random values. This seems to be a bit inconsistent and I think I'd prefer consistent random value generation. Other than that this looks great to me.

I've also asked the faer maintainer Sarah-ek to have a look at this. They might have some valuable input as well. But to me everything looks good, except the inconsistency with the constant values.

Copy link
Collaborator

@geo-ant geo-ant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall fantastic, I just had some questions on the use of reproducible matrix and some leftover constant vectors. Plus one remark on the cholesky test.

bh.bench_function("cholesky_100x100", |bh| {
bh.iter_batched(
|| {
let m = crate::reproducible_dmatrix(100, 100);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suspicion why this calls the reproducible matrix and I think there's a problem. Let me explain. For a Cholesky decomposition of a matrix A to be defined, we need the matrix to be symmetric positive definite. That's actually why the line let m = &m * m.transpose() exists in the old test, but it's still wrong. To create a symmetric positive semidefinite matrix, it's okay to calculate A A^T, but this might still be singular. A numerically stable way to create an actually positive definite matrix from that is to calculate A A^T + alpha * Id with Id the identity matrix and alpha chosen for numerical stability. An alpha that works is e.g. f64::EPSILON * A.norm_squared(). I know this because I had to fix that exact problem in the nalgebra-lapack proptests recently, see https://github.com/dimforge/nalgebra/blob/main/nalgebra-lapack/tests/linalg/cholesky.rs, specifically the positive_definite_dmatrix function.

bh.bench_function("cholesky_500x500", |bh| {
bh.iter_batched(
|| {
let m = crate::reproducible_dmatrix(500, 500);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the 100x100 test

bh.bench_function("cholesky_decompose_unpack_100x100", |bh| {
bh.iter_batched(
|| {
let m = crate::reproducible_dmatrix(100, 100);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the 100x100 test

bh.bench_function("cholesky_decompose_unpack_500x500", |bh| {
bh.iter_batched(
|| {
let m = crate::reproducible_dmatrix(500, 500);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the 100x100 test

bh.bench_function("cholesky_solve_10x10", |bh| {
bh.iter_batched_ref(
|| {
let m = crate::reproducible_dmatrix(10, 10);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the 100x100 test

bh.iter_batched(
|| {
let m = DMatrix::<f64>::new_random(10, 10);
(QR::new(m), DVector::<f64>::from_element(10, 1.0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-random-vector

bh.iter_batched(
|| {
let m = DMatrix::<f64>::new_random(100, 100);
(QR::new(m), DVector::<f64>::from_element(100, 1.0))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-random-vector

bh.iter(|| std::hint::black_box(Schur::new(m.clone())))
bh.bench_function("schur_decompose_4x4", |bh| {
bh.iter_batched(
|| crate::reproducible_smatrix::<f64, 4, 4>(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the reproducible matrix called here? I'm not so familiar with the Schur decomposition, but from a cursory glance at wikipedia, any square real matrix should have one. Same question for the other instances of the test below.

bh.iter(|| std::hint::black_box(m.complex_eigenvalues()))
bh.bench_function("eigenvalues_4x4", |bh| {
bh.iter_batched_ref(
|| crate::reproducible_smatrix::<f64, 4, 4>(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above, why cal the reproducible matrix here instead of a random one?

bh.iter(|| std::hint::black_box(SVD::new_unordered(m.clone(), true, true)))
bh.bench_function("svd_decompose_2x2", |bh| {
bh.iter_batched(
|| crate::reproducible_smatrix::<f32, 2, 2>(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use the reproducible matrix here? Same for the instances below

@geo-ant
Copy link
Collaborator

geo-ant commented Oct 2, 2025

hey @im-0, I've implemented the changes myself, because I felt I was bothering you unduly. Please let me know if you agree with those and then I think we can get this merged.

@im-0
Copy link
Contributor Author

im-0 commented Oct 6, 2025

Was busy with other things. I will check this later today or tomorrow.

I think that at least for some algorithms it will be better to use a predictable sequence of random matrices instead of completely random values on each benchmark run. But I am not completely sure about this and need to check the actual implementation...

@geo-ant
Copy link
Collaborator

geo-ant commented Oct 6, 2025

@im-0 please feel free to implement changes as you see fit. I think this will be the last iteration. The one thing I'm wondering is whether the 'reproduciple_matrix' actually produces a random sequence of matrices or whether it seeds the rng on each call. I'm on mobile right now, so I don't have the code at hand.

@geo-ant
Copy link
Collaborator

geo-ant commented Oct 6, 2025

@im-0 UPDATE: I've looked at the code and each call to reproducible_dmatrix seeds the rng with 0 again. I've also written a little test program on my local PC to verify, just in case.

That means the sequence of random numbers will always be the same for each call. So two matrices of the same size created with repdroducible_dmatrix will be identical. This actually makes me feel that using it in benchmarks is actually not very good, since it will test exactly the behavior for this specific matrix. Assuming there even are differences depending on the contents of the matrices, this is not what we want, I don' t think.

@geo-ant
Copy link
Collaborator

geo-ant commented Oct 12, 2025

@im-0, not trying to rush you. I just know you had some thoughts about whether you are happy with this PR to get merged. Let me know if you are fine to proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants