|
4 | 4 | [](https://JuliaSIMD.github.io/TriangularSolve.jl/dev)
|
5 | 5 | [](https://github.com/JuliaSIMD/TriangularSolve.jl/actions)
|
6 | 6 | [](https://codecov.io/gh/JuliaSIMD/TriangularSolve.jl)
|
| 7 | + |
| 8 | + |
| 9 | +Performs some triangular solves. For example: |
| 10 | +```julia |
| 11 | +julia> using TriangularSolve, LinearAlgebra, MKL; |
| 12 | + |
| 13 | +julia> BLAS.set_num_threads(1) |
| 14 | + |
| 15 | +julia> BLAS.get_config().loaded_libs |
| 16 | +1-element Vector{LinearAlgebra.BLAS.LBTLibraryInfo}: |
| 17 | + LBTLibraryInfo(libmkl_rt.so, ilp64) |
| 18 | + |
| 19 | +julia> N = 100; |
| 20 | + |
| 21 | +julia> A = rand(N,N); B = rand(N,N); C = similar(A); |
| 22 | + |
| 23 | +julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false)) # false means single threaded |
| 24 | +BenchmarkTools.Trial: 10000 samples with 1 evaluation. |
| 25 | + Range (min … max): 15.909 μs … 41.524 μs ┊ GC (min … max): 0.00% … 0.00% |
| 26 | + Time (median): 17.916 μs ┊ GC (median): 0.00% |
| 27 | + Time (mean ± σ): 17.751 μs ± 697.786 ns ┊ GC (mean ± σ): 0.00% ± 0.00% |
| 28 | + |
| 29 | + ▃▁ ▁ ▁ ▄▁ ▇▆ ▆█▃ ▂ |
| 30 | + ██▃▁▁██▁▁▁▁█▆▁▁▃▇██▄▃▁███▆▁▄▄███▄▄▅▅▆▇█▇▄▅▆▇██▇█▇▇▆▄▅▄▁▄▁▄▄▇ █ |
| 31 | + 15.9 μs Histogram: log(frequency) by time 19.9 μs < |
| 32 | + |
| 33 | + Memory estimate: 0 bytes, allocs estimate: 0. |
| 34 | + |
| 35 | +julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B)) |
| 36 | +BenchmarkTools.Trial: 10000 samples with 1 evaluation. |
| 37 | + Range (min … max): 17.578 μs … 75.835 μs ┊ GC (min … max): 0.00% … 0.00% |
| 38 | + Time (median): 19.852 μs ┊ GC (median): 0.00% |
| 39 | + Time (mean ± σ): 19.827 μs ± 1.342 μs ┊ GC (mean ± σ): 0.00% ± 0.00% |
| 40 | + |
| 41 | + ▄▂ ▂ ▆▅ ▁█▇▂ ▅▃ ▂ ▂ |
| 42 | + ██▁▁▃█▇▁▁▁█▇▄▄▁██▇▄▄▄██▆▅▄████▅▄▆██▆▆▆▆▇██▇▇▆▆▇▆▅▆▄▅▅▆▄▅▄▅▅ █ |
| 43 | + 17.6 μs Histogram: log(frequency) by time 22.4 μs < |
| 44 | + |
| 45 | + Memory estimate: 0 bytes, allocs estimate: 0. |
| 46 | + |
| 47 | +julia> @benchmark ldiv!($C, LowerTriangular($B), $A) |
| 48 | +BenchmarkTools.Trial: 10000 samples with 1 evaluation. |
| 49 | + Range (min … max): 19.102 μs … 69.966 μs ┊ GC (min … max): 0.00% … 0.00% |
| 50 | + Time (median): 21.561 μs ┊ GC (median): 0.00% |
| 51 | + Time (mean ± σ): 21.565 μs ± 890.952 ns ┊ GC (mean ± σ): 0.00% ± 0.00% |
| 52 | + |
| 53 | + ▂▂ ▂▃ ▄▄ ▆█▄ ▅▅ ▂ |
| 54 | + ██▃▁▁▁▇█▁▁▁▁▅█▁▁▁▁▁██▅▁▁▁▅██▆▁▁▁▆███▆▅▃▅████▃▄▅██▇▇▅▆▆▇▇█▇▆▆ █ |
| 55 | + 19.1 μs Histogram: log(frequency) by time 23.4 μs < |
| 56 | + |
| 57 | + Memory estimate: 0 bytes, allocs estimate: 0. |
| 58 | + |
| 59 | +julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded |
| 60 | +BenchmarkTools.Trial: 10000 samples with 1 evaluation. |
| 61 | + Range (min … max): 19.082 μs … 39.078 μs ┊ GC (min … max): 0.00% … 0.00% |
| 62 | + Time (median): 19.694 μs ┊ GC (median): 0.00% |
| 63 | + Time (mean ± σ): 19.765 μs ± 774.848 ns ┊ GC (mean ± σ): 0.00% ± 0.00% |
| 64 | + |
| 65 | + ▃ ▄█ ▁ |
| 66 | + ▂▇██▄▂▁▁▂▂▃███▃▂▁▂▁▂▂▅█▇▃▂▂▂▁▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂ ▃ |
| 67 | + 19.1 μs Histogram: frequency by time 22.1 μs < |
| 68 | + |
| 69 | + Memory estimate: 0 bytes, allocs estimate: 0. |
| 70 | +``` |
| 71 | +Multithreaded benchmarks: |
| 72 | +```julia |
| 73 | +julia> BLAS.set_num_threads(TriangularSolve.VectorizationBase.num_cores()) |
| 74 | + |
| 75 | +julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B)) |
| 76 | +BenchmarkTools.Trial: 10000 samples with 3 evaluations. |
| 77 | + Range (min … max): 8.309 μs … 24.357 μs ┊ GC (min … max): 0.00% … 0.00% |
| 78 | + Time (median): 8.769 μs ┊ GC (median): 0.00% |
| 79 | + Time (mean ± σ): 8.812 μs ± 382.702 ns ┊ GC (mean ± σ): 0.00% ± 0.00% |
| 80 | + |
| 81 | + ▁▃▄▆▆██▇▆▅▃▁ |
| 82 | + ▂▁▂▂▂▂▃▃▃▄▅▇██████████████▇▆▅▄▃▃▃▃▃▂▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▄ |
| 83 | + 8.31 μs Histogram: frequency by time 9.7 μs < |
| 84 | + |
| 85 | + Memory estimate: 0 bytes, allocs estimate: 0. |
| 86 | + |
| 87 | +julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B)) |
| 88 | +BenchmarkTools.Trial: 10000 samples with 1 evaluation. |
| 89 | + Range (min … max): 11.996 μs … 151.147 μs ┊ GC (min … max): 0.00% … 0.00% |
| 90 | + Time (median): 14.163 μs ┊ GC (median): 0.00% |
| 91 | + Time (mean ± σ): 14.281 μs ± 2.372 μs ┊ GC (mean ± σ): 0.00% ± 0.00% |
| 92 | + |
| 93 | + ▂▄▇███▇▆▅▃▂ ▁ ▂▄▄▅▅▅▆▃▃ ▁ |
| 94 | + ▁▁▁▂▂▃▄▇██████████████████████████▇▆▅▄▅▆▇███▆▅▅▃▄▂▂▂▁▁▁▁▁▁▁▁ ▅ |
| 95 | + 12 μs Histogram: frequency by time 17.3 μs < |
| 96 | + |
| 97 | + Memory estimate: 0 bytes, allocs estimate: 0. |
| 98 | + |
| 99 | +julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A) |
| 100 | +BenchmarkTools.Trial: 10000 samples with 5 evaluations. |
| 101 | + Range (min … max): 7.903 μs … 22.442 μs ┊ GC (min … max): 0.00% … 0.00% |
| 102 | + Time (median): 9.871 μs ┊ GC (median): 0.00% |
| 103 | + Time (mean ± σ): 9.789 μs ± 864.957 ns ┊ GC (mean ± σ): 0.00% ± 0.00% |
| 104 | + |
| 105 | + ▂▃ ▄▃ ▃▅ ▅▃ ▆▂ ▆▄ ▂▇▄ ▃█▅▂▂▁▁▄▆▃▁ ▁ ▂ |
| 106 | + ██▅▂██▆▅██▆▆▆██▇▇███▇▇▇████▇█████▆██████████████▇███▇▇▆▇▆▅▆ █ |
| 107 | + 7.9 μs Histogram: log(frequency) by time 11.8 μs < |
| 108 | + |
| 109 | + Memory estimate: 0 bytes, allocs estimate: 0. |
| 110 | + |
| 111 | +julia> @benchmark ldiv!($C, LowerTriangular($B), $A) |
| 112 | +BenchmarkTools.Trial: 10000 samples with 1 evaluation. |
| 113 | + Range (min … max): 13.507 μs … 142.574 μs ┊ GC (min … max): 0.00% … 0.00% |
| 114 | + Time (median): 15.258 μs ┊ GC (median): 0.00% |
| 115 | + Time (mean ± σ): 15.319 μs ± 2.045 μs ┊ GC (mean ± σ): 0.00% ± 0.00% |
| 116 | + |
| 117 | + ▁▃ ▁▂ ▁▃▅▁ ▁▄▄▁ ▂▆█▆▃ |
| 118 | + ▁▂▅███▆▇███▆▅████▆▅████▆▆█████▆▄▄▆▆▅▄▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁ ▄ |
| 119 | + 13.5 μs Histogram: frequency by time 18.5 μs < |
| 120 | + |
| 121 | + Memory estimate: 0 bytes, allocs estimate: 0. |
| 122 | + |
| 123 | +julia> versioninfo() |
| 124 | +Julia Version 1.8.0-DEV.438 |
| 125 | +Commit 88a6376e99* (2021-08-28 11:03 UTC) |
| 126 | +Platform Info: |
| 127 | + OS: Linux (x86_64-redhat-linux) |
| 128 | + CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz |
| 129 | + WORD_SIZE: 64 |
| 130 | + LIBM: libopenlibm |
| 131 | + LLVM: libLLVM-12.0.1 (ORCJIT, tigerlake) |
| 132 | +Environment: |
| 133 | + JULIA_NUM_THREADS = 8 |
| 134 | +``` |
| 135 | + |
| 136 | + |
| 137 | +For editing convenience (you can copy/paste the above into a REPL and it should automatically strip `julia> `s and outputs, but the above is less convenient to edit if you want to try changing the benchmarks): |
| 138 | +```julia |
| 139 | +using TriangularSolve, LinearAlgebra, MKL; |
| 140 | +BLAS.set_num_threads(Threads.nthreads()) |
| 141 | +BLAS.get_config().loaded_libs |
| 142 | +N = 100; |
| 143 | + |
| 144 | +A = rand(N,N); B = rand(N,N); C = similar(A); |
| 145 | + |
| 146 | +@benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false)) |
| 147 | +@benchmark rdiv!(copyto!($C, $A), UpperTriangular($B)) |
| 148 | + |
| 149 | +@benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) |
| 150 | +@benchmark ldiv!($C, LowerTriangular($B), $A) |
| 151 | + |
| 152 | +BLAS.set_num_threads(TriangularSolve.VectorizationBase.num_cores()) |
| 153 | +@benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B)) |
| 154 | +@benchmark rdiv!(copyto!($C, $A), UpperTriangular($B)) |
| 155 | + |
| 156 | +@benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A) |
| 157 | +@benchmark ldiv!($C, LowerTriangular($B), $A) |
| 158 | + |
| 159 | +versioninfo() |
| 160 | +``` |
| 161 | + |
| 162 | +Currently, `rdiv!` with `UpperTriangular` and `ldiv!` with `LowerTriangulra` matrices are the only supported configurations. |
| 163 | + |
| 164 | + |
0 commit comments