@@ -132,7 +132,74 @@ Platform Info:
132
132
Environment:
133
133
JULIA_NUM_THREADS = 8
134
134
```
135
+ Single-threaded benchmarks on an M1 mac:
136
+ ``` julia
137
+ julia> N = 100 ;
138
+
139
+ julia> A = rand (N,N); B = rand (N,N); C = similar (A);
140
+
141
+ julia> @benchmark TriangularSolve. rdiv! ($ C, $ A, UpperTriangular ($ B), Val (false )) # false means single threaded
142
+ BenchmarkTools. Trial: 10000 samples with 1 evaluation.
143
+ Range (min … max): 21.416 μs … 34.458 μs ┊ GC (min … max): 0.00 % … 0.00 %
144
+ Time (median): 21.624 μs ┊ GC (median): 0.00 %
145
+ Time (mean ± σ): 21.767 μs ± 491.788 ns ┊ GC (mean ± σ): 0.00 % ± 0.00 %
146
+
147
+ ▃ ▆██ ▆▄ ▁ ▃▄ ▄▂ ▁ ▂▃▁ ▂
148
+ ▃▇█▁███▁██▁█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▃█▁██▁███▁▆▃▁▁▆▇▁██▁█▆▅▁▄▃▁▃▃▇▁███ █
149
+ 21.4 μs Histogram: log (frequency) by time 23.2 μs <
150
+
151
+ Memory estimate: 0 bytes, allocs estimate: 0.
152
+
153
+ julia> @benchmark rdiv! (copyto! ($ C, $ A), UpperTriangular ($ B))
154
+ BenchmarkTools. Trial: 10000 samples with 1 evaluation.
155
+ Range (min … max): 39.124 μs … 57.749 μs ┊ GC (min … max): 0.00 % … 0.00 %
156
+ Time (median): 46.166 μs ┊ GC (median): 0.00 %
157
+ Time (mean ± σ): 46.274 μs ± 1.766 μs ┊ GC (mean ± σ): 0.00 % ± 0.00 %
158
+
159
+ ▁▁▄▂▆▃█▅▇▄▇▅▃▃▁▃▁▂
160
+ ▂▁▁▂▂▂▂▂▁▂▂▂▂▂▂▃▃▃▃▃▄▄▅▅▆▅▇▇████████████████████▆▇▆▆▅▆▅▅▄▃▃ ▅
161
+ 39.1 μs Histogram: frequency by time 50.2 μs <
135
162
163
+ Memory estimate: 0 bytes, allocs estimate: 0.
164
+
165
+ julia> @benchmark ldiv! ($ C, LowerTriangular ($ B), $ A)
166
+ BenchmarkTools. Trial: 10000 samples with 1 evaluation.
167
+ Range (min … max): 48.291 μs … 57.833 μs ┊ GC (min … max): 0.00 % … 0.00 %
168
+ Time (median): 49.124 μs ┊ GC (median): 0.00 %
169
+ Time (mean ± σ): 49.306 μs ± 802.143 ns ┊ GC (mean ± σ): 0.00 % ± 0.00 %
170
+
171
+ ▁▃▅▆▇██▇██▇▇▆▅▄▂▂▁▁▁▂▁▁▁▁▁▁▁ ▁▁▁ ▃
172
+ ▃████████████████████████████████████▇▆▄▂▄▃▂▃▃▄▄▃▆▅▇▇▇██▇█▇▇ █
173
+ 48.3 μs Histogram: log (frequency) by time 53 μs <
174
+
175
+ Memory estimate: 0 bytes, allocs estimate: 0.
176
+
177
+ julia> @benchmark TriangularSolve. ldiv! ($ C, LowerTriangular ($ B), $ A, Val (false )) # false means single threaded
178
+ BenchmarkTools. Trial: 10000 samples with 1 evaluation.
179
+ Range (min … max): 34.249 μs … 40.208 μs ┊ GC (min … max): 0.00 % … 0.00 %
180
+ Time (median): 34.375 μs ┊ GC (median): 0.00 %
181
+ Time (mean ± σ): 34.748 μs ± 774.675 ns ┊ GC (mean ± σ): 0.00 % ± 0.00 %
182
+
183
+ ▆██▆▃▄▅▃ ▁▁▄▅▅▃▂▁ ▂▃▂ ▁▂ ▂
184
+ ████████▁▁▃▁▁▁▁▁▃▄▃▁▁▃██████████▇▅▄▅▅▆▄▄▄▄▄▅▄▄▃▅▃▄▃▅█████▇██ █
185
+ 34.2 μs Histogram: log (frequency) by time 37.1 μs <
186
+
187
+ Memory estimate: 0 bytes, allocs estimate: 0.
188
+ ```
189
+ Or
190
+ ``` julia
191
+ julia> @benchmark TriangularSolve. ldiv! ($ C, LowerTriangular ($ B), $ A, Val (false )) # false means single threaded
192
+ BenchmarkTools. Trial: 10000 samples with 1 evaluation.
193
+ Range (min … max): 23.750 μs … 30.541 μs ┊ GC (min … max): 0.00 % … 0.00 %
194
+ Time (median): 23.875 μs ┊ GC (median): 0.00 %
195
+ Time (mean ± σ): 23.948 μs ± 316.293 ns ┊ GC (mean ± σ): 0.00 % ± 0.00 %
196
+
197
+ ▃▁▆ █ ▇▆▆ ▄ ▁ ▁ ▁ ▁ ▁ ▂
198
+ ▅███▆█▁███▄█▁██▇▁▄▁▁▁▁▁▃▁▁▁▁▁▁▁▃▁▁▁▃▁▁▁▁▁▆▁▇▆█▁█▁▇▆▅▁▅▁▇▆█▁█ █
199
+ 23.8 μs Histogram: log (frequency) by time 25 μs <
200
+
201
+ Memory estimate: 0 bytes, allocs estimate: 0.
202
+ ```
136
203
137
204
For editing convenience (you can copy/paste the above into a REPL and it should automatically strip ` julia> ` s and outputs, but the above is less convenient to edit if you want to try changing the benchmarks):
138
205
``` julia
0 commit comments