-
-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
benchmarking bitpacking on an Apple M3 Max-powered laptop, it seems the handwritten neon code is actually detrimental to performance.
Below is a run of cargo bench, reference is current main, results of this run is with anything "neon" or "aarch"-specific removed. There is little impact on plain bitpacking, but the delta and strict-delta variant show huge improvements accros the board.
It would be interesting if someone can reproduce on a different arm-powered device
bench results
BitPacker4x/decompress-1
time: [52.879 ns 52.905 ns 52.941 ns]
thrpt: [24.178 Gelem/s 24.194 Gelem/s 24.206 Gelem/s]
change:
time: [-1.0221% -0.8966% -0.7573%] (p = 0.00 < 0.05)
thrpt: [+0.7631% +0.9047% +1.0326%]
Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
1 (1.00%) low severe
5 (5.00%) low mild
3 (3.00%) high mild
5 (5.00%) high severe
BitPacker4x/decompress-delta-1
time: [665.92 ns 666.13 ns 666.32 ns]
thrpt: [1.9210 Gelem/s 1.9215 Gelem/s 1.9222 Gelem/s]
change:
time: [-53.512% -53.470% -53.426%] (p = 0.00 < 0.05)
thrpt: [+114.71% +114.92% +115.11%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) low mild
4 (4.00%) high severe
BitPacker4x/decompress-strict-delta-1
time: [921.91 ns 923.87 ns 926.28 ns]
thrpt: [1.3819 Gelem/s 1.3855 Gelem/s 1.3884 Gelem/s]
change:
time: [-29.550% -29.334% -29.123%] (p = 0.00 < 0.05)
thrpt: [+41.090% +41.511% +41.945%]
Performance has improved.
BitPacker4x/compress-1 time: [104.32 ns 104.44 ns 104.59 ns]
thrpt: [12.239 Gelem/s 12.255 Gelem/s 12.270 Gelem/s]
change:
time: [-0.9624% -0.7354% -0.5054%] (p = 0.00 < 0.05)
thrpt: [+0.5080% +0.7408% +0.9718%]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
BitPacker4x/compress-delta-1
time: [154.11 ns 154.40 ns 154.66 ns]
thrpt: [8.2765 Gelem/s 8.2900 Gelem/s 8.3057 Gelem/s]
change:
time: [-20.762% -20.633% -20.503%] (p = 0.00 < 0.05)
thrpt: [+25.791% +25.997% +26.202%]
Performance has improved.
BitPacker4x/compress-strict-delta-1
time: [176.85 ns 177.13 ns 177.49 ns]
thrpt: [7.2117 Gelem/s 7.2264 Gelem/s 7.2377 Gelem/s]
change:
time: [-7.9960% -7.7879% -7.5765%] (p = 0.00 < 0.05)
thrpt: [+8.1976% +8.4456% +8.6909%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
BitPacker4x/decompress-2
time: [52.683 ns 52.861 ns 53.024 ns]
thrpt: [24.140 Gelem/s 24.214 Gelem/s 24.296 Gelem/s]
change:
time: [-1.9552% -1.7086% -1.4512%] (p = 0.00 < 0.05)
thrpt: [+1.4726% +1.7383% +1.9942%]
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
BitPacker4x/decompress-delta-2
time: [661.34 ns 662.67 ns 664.36 ns]
thrpt: [1.9267 Gelem/s 1.9316 Gelem/s 1.9355 Gelem/s]
change:
time: [-47.969% -47.723% -47.464%] (p = 0.00 < 0.05)
thrpt: [+90.346% +91.289% +92.192%]
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) low mild
3 (3.00%) high mild
10 (10.00%) high severe
BitPacker4x/decompress-strict-delta-2
time: [917.22 ns 921.65 ns 926.67 ns]
thrpt: [1.3813 Gelem/s 1.3888 Gelem/s 1.3955 Gelem/s]
change:
time: [-25.226% -24.839% -24.468%] (p = 0.00 < 0.05)
thrpt: [+32.394% +33.048% +33.737%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
BitPacker4x/compress-2 time: [102.32 ns 102.80 ns 103.41 ns]
thrpt: [12.378 Gelem/s 12.452 Gelem/s 12.510 Gelem/s]
change:
time: [-2.3590% -1.9400% -1.4818%] (p = 0.00 < 0.05)
thrpt: [+1.5041% +1.9784% +2.4160%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
BitPacker4x/compress-delta-2
time: [155.74 ns 156.14 ns 156.57 ns]
thrpt: [8.1755 Gelem/s 8.1980 Gelem/s 8.2190 Gelem/s]
change:
time: [-8.8859% -8.6692% -8.4502%] (p = 0.00 < 0.05)
thrpt: [+9.2302% +9.4921% +9.7524%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
BitPacker4x/compress-strict-delta-2
time: [177.30 ns 177.97 ns 178.71 ns]
thrpt: [7.1623 Gelem/s 7.1923 Gelem/s 7.2196 Gelem/s]
change:
time: [-5.9149% -5.6137% -5.3176%] (p = 0.00 < 0.05)
thrpt: [+5.6163% +5.9476% +6.2867%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
BitPacker4x/decompress-24
time: [60.974 ns 61.069 ns 61.173 ns]
thrpt: [20.924 Gelem/s 20.960 Gelem/s 20.993 Gelem/s]
change:
time: [-0.6847% -0.4837% -0.2973%] (p = 0.00 < 0.05)
thrpt: [+0.2982% +0.4860% +0.6895%]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
BitPacker4x/decompress-delta-24
time: [533.76 ns 534.65 ns 535.61 ns]
thrpt: [2.3898 Gelem/s 2.3941 Gelem/s 2.3981 Gelem/s]
change:
time: [-52.624% -52.456% -52.290%] (p = 0.00 < 0.05)
thrpt: [+109.60% +110.33% +111.08%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild
BitPacker4x/decompress-strict-delta-24
time: [795.94 ns 799.04 ns 802.41 ns]
thrpt: [1.5952 Gelem/s 1.6019 Gelem/s 1.6082 Gelem/s]
change:
time: [-27.538% -27.277% -27.006%] (p = 0.00 < 0.05)
thrpt: [+36.997% +37.509% +38.004%]
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
3 (3.00%) high mild
10 (10.00%) high severe
BitPacker4x/compress-24 time: [105.41 ns 105.60 ns 105.81 ns]
thrpt: [12.097 Gelem/s 12.121 Gelem/s 12.143 Gelem/s]
change:
time: [+0.3718% +0.6612% +0.9934%] (p = 0.00 < 0.05)
thrpt: [-0.9836% -0.6568% -0.3705%]
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
BitPacker4x/compress-delta-24
time: [130.25 ns 130.53 ns 130.85 ns]
thrpt: [9.7819 Gelem/s 9.8061 Gelem/s 9.8272 Gelem/s]
change:
time: [-28.061% -27.890% -27.711%] (p = 0.00 < 0.05)
thrpt: [+38.333% +38.677% +39.006%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mild
BitPacker4x/compress-strict-delta-24
time: [147.13 ns 147.37 ns 147.60 ns]
thrpt: [8.6721 Gelem/s 8.6853 Gelem/s 8.6997 Gelem/s]
change:
time: [-21.304% -21.093% -20.890%] (p = 0.00 < 0.05)
thrpt: [+26.407% +26.731% +27.072%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
BitPacker4x/decompress-31
time: [71.861 ns 71.909 ns 71.960 ns]
thrpt: [17.788 Gelem/s 17.800 Gelem/s 17.812 Gelem/s]
change:
time: [-1.0571% -0.7500% -0.4758%] (p = 0.00 < 0.05)
thrpt: [+0.4781% +0.7556% +1.0684%]
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe
BitPacker4x/decompress-delta-31
time: [635.70 ns 636.28 ns 636.88 ns]
thrpt: [2.0098 Gelem/s 2.0117 Gelem/s 2.0135 Gelem/s]
change:
time: [-49.275% -49.152% -49.047%] (p = 0.00 < 0.05)
thrpt: [+96.258% +96.664% +97.142%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
BitPacker4x/decompress-strict-delta-31
time: [941.63 ns 944.23 ns 947.10 ns]
thrpt: [1.3515 Gelem/s 1.3556 Gelem/s 1.3593 Gelem/s]
change:
time: [-24.421% -24.203% -23.991%] (p = 0.00 < 0.05)
thrpt: [+31.563% +31.932% +32.312%]
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
BitPacker4x/compress-31 time: [123.71 ns 124.11 ns 124.57 ns]
thrpt: [10.275 Gelem/s 10.313 Gelem/s 10.347 Gelem/s]
change:
time: [-1.2531% -0.9205% -0.5680%] (p = 0.00 < 0.05)
thrpt: [+0.5712% +0.9290% +1.2690%]
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
BitPacker4x/compress-delta-31
time: [112.19 ns 112.36 ns 112.53 ns]
thrpt: [11.374 Gelem/s 11.392 Gelem/s 11.409 Gelem/s]
change:
time: [-38.633% -38.431% -38.246%] (p = 0.00 < 0.05)
thrpt: [+61.932% +62.420% +62.955%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
BitPacker4x/compress-strict-delta-31
time: [128.39 ns 128.65 ns 128.94 ns]
thrpt: [9.9272 Gelem/s 9.9495 Gelem/s 9.9699 Gelem/s]
change:
time: [-30.464% -30.310% -30.148%] (p = 0.00 < 0.05)
thrpt: [+43.160% +43.494% +43.811%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
Metadata
Metadata
Assignees
Labels
No labels