Skip to content

Latest commit

 

History

History
99 lines (81 loc) · 4.6 KB

Benchmarks.md

File metadata and controls

99 lines (81 loc) · 4.6 KB

Benchmarks

The following are the mean value found by running the specified implementation multiple times on the default matrix of the chosen size generated by application.

Times are expressed in milliseconds.
The speedup refers to the corrisponding serial implementation.

Matrix size: 1023 X 1024, for a total of 1047552 elements

(serial) Intel i5-7600 CPU @ 3.50GHz Gaussian elimination kernel Solution vector kernel TOT Speedup
No pivot / / 940.37 /
Partial pivot / / 1007.63 /
(intel) Intel i5-7600 CPU @ 3.50GHz Gaussian elimination kernel Solution vector kernel TOT Speedup
No pivot texture 1021.48 13.63 1035.11 x0.9
No pivot texture vec 4 268.06 15.97 284.03 x3.3
No pivot buffer 305.05 12.39 317.44 x3.0
No pivot buffer vec 4 260.94 12.37 273.31 x3.4
Partial pivot texture 5387.87 14.58 5402.45 x0.2
Partial pivot texture vec 4 1349.53 15.51 1365.04 x0.7
Partial pivot buffer 4264.74 12.56 4277.30 x0.2
Partial pivot buffer vec 4 1194.86 12.45 1207.31 x0.8
(pocl) Intel i5-7600 CPU @ 3.50GHz Gaussian elimination kernel Solution vector kernel TOT Speedup
No pivot texture vec 4 1363.69 23.95 1387.64 x0.7
No pivot buffer 296.43 19.55 315.98 x2.9
No pivot buffer vec 4 266.76 19.63 286.39 x3.3
Partial pivot texture vec 4 3002.90 23.81 3026.71 x0.3
Partial pivot buffer 3546.58 18.58 3565.16 x0.3
Partial pivot buffer vec 4 1057.72 20.67 1078.39 x0.9
Nvidia GeForce GTX 1060 3GB Gaussian elimination kernel Solution vector kernel TOT Speedup
No pivot texture 45.83 4.47 50.30 x18.7
No pivot texture vec 4 26.44 4.79 31.24 x30.1
No pivot buffer 57.56 5.26 62.82 x15.0
No pivot buffer vec 4 39.64 5.16 44.80 x21.0
Partial pivot texture 185.57 3.93 189.51 x5.3
Partial pivot texture vec 4 39.60 3.94 43.54 x23.1
Partial pivot buffer 399.86 4.44 404.30 x2.5
Partial pivot buffer vec 4 57.83 4.45 62.27 x16.2

Matrix size: 2047 X 2048, for a total of 4192256 elements

(serial) Intel i5-7600 CPU @ 3.50GHz Gaussian elimination kernel Solution vector kernel TOT Speedup
No pivot / / 7462.43 /
Parital pivot / / 7442.95 /
(intel) Intel i5-7600 CPU @ 3.50GHz Gaussian elimination kernel Solution vector kernel TOT Speedup
No pivot texture 8166.98 58.23 8225.22 x0.9
No pivot texture vec 4 2367.32 60.65 2427.96 x3.1
No pivot buffer 3364.11 52.83 3416.94 x2.2
No pivot buffer vec 4 2810.63 53.23 2863.86 x2.6
Partial pivot texture 92721.58 63.11 92784.69 x0.1
Partial pivot texture vec 4 23922.29 66.85 23989.14 x0.3
Partial pivot buffer 59043.69 109.10 59152.79 x0.1
Partial pivot buffer vec 4 12502.44 110.52 12612.96 x0.6
(pocl) Intel i5-7600 CPU @ 3.50GHz Gaussian elimination kernel Solution vector kernel TOT Speedup
No pivot texture vec 4 8748.75 139.74 8888.49 x0.8
No pivot buffer 2891.91 96.99 2988.90 x2.5
No pivot buffer vec 4 2784.98 93.01 2877.99 x2.6
Partial pivot texture vec 4 30451.27 137.12 30588.39 x0.3
Partial pivot buffer 58998.48 100.44 59098.92 x0.1
Partial pivot buffer vec 4 11437.74 94.08 11531.82 x0.6
Nvidia GeForce GTX 1060 3GB Gaussian elimination kernel Solution vector kernel TOT Speedup
No pivot texture 301.65 8.95 310.60 x24.0
No pivot texture vec 4 168.69 9.62 178.32 x41.8
No pivot buffer 374.71 10.26 384.96 x19.4
No pivot buffer vec 4 296.88 10.26 307.13 x24.3
Partial pivot texture 2389.61 9.36 2398.97 x3.1
Partial pivot texture vec 4 356.11 9.54 365.64 x20.4
Partial pivot buffer 5971.01 10.49 5981.49 x1.2
Partial pivot buffer vec 4 692.61 10.49 703.09 x10.6

Results

Category Implementation Result
Greater speedup (NVidia) No pivot texture vec 4 x41.8
Smaller positive speedup (NVidia) Partial pivot buffer x1.2
Biggest vectorization impact (NVidia) Partial pivot buffer vec 4 x8.6
Worst speedup (intel) Partial pivot buffer x0.1

Back to table of contents