Skip to content

Commit 64f1b07

Browse files
authoredApr 15, 2024
Merge pull request #39 from JuliaSIMD/staticallycompileableldiv
Improve static compilation, reduce uses of `lubuffer`
2 parents 292f660 + b7e9640 commit 64f1b07

File tree

4 files changed

+628
-237
lines changed

4 files changed

+628
-237
lines changed
 

‎Project.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "TriangularSolve"
22
uuid = "d5829a12-d9aa-46ab-831f-fb7c9ab06edf"
33
authors = ["chriselrod <elrodc@gmail.com> and contributors"]
4-
version = "0.1.21"
4+
version = "0.2.0"
55

66
[deps]
77
CloseOpenIntervals = "fb6a15b2-703c-40df-9091-08a04967cfa9"

‎README.md

+67
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,74 @@ Platform Info:
132132
Environment:
133133
JULIA_NUM_THREADS = 8
134134
```
135+
Single-threaded benchmarks on an M1 mac:
136+
```julia
137+
julia> N = 100;
138+
139+
julia> A = rand(N,N); B = rand(N,N); C = similar(A);
140+
141+
julia> @benchmark TriangularSolve.rdiv!($C, $A, UpperTriangular($B), Val(false)) # false means single threaded
142+
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
143+
Range (min max): 21.416 μs 34.458 μs ┊ GC (min max): 0.00% 0.00%
144+
Time (median): 21.624 μs ┊ GC (median): 0.00%
145+
Time (mean ± σ): 21.767 μs ± 491.788 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
146+
147+
▃ ▆██ ▆▄ ▁ ▃▄ ▄▂ ▁ ▂▃▁ ▂
148+
▃▇█▁███▁██▁█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▃█▁██▁███▁▆▃▁▁▆▇▁██▁█▆▅▁▄▃▁▃▃▇▁███ █
149+
21.4 μs Histogram: log(frequency) by time 23.2 μs <
150+
151+
Memory estimate: 0 bytes, allocs estimate: 0.
152+
153+
julia> @benchmark rdiv!(copyto!($C, $A), UpperTriangular($B))
154+
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
155+
Range (min max): 39.124 μs 57.749 μs ┊ GC (min max): 0.00% 0.00%
156+
Time (median): 46.166 μs ┊ GC (median): 0.00%
157+
Time (mean ± σ): 46.274 μs ± 1.766 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
158+
159+
▁▁▄▂▆▃█▅▇▄▇▅▃▃▁▃▁▂
160+
▂▁▁▂▂▂▂▂▁▂▂▂▂▂▂▃▃▃▃▃▄▄▅▅▆▅▇▇████████████████████▆▇▆▆▅▆▅▅▄▃▃ ▅
161+
39.1 μs Histogram: frequency by time 50.2 μs <
135162

163+
Memory estimate: 0 bytes, allocs estimate: 0.
164+
165+
julia> @benchmark ldiv!($C, LowerTriangular($B), $A)
166+
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
167+
Range (min max): 48.291 μs 57.833 μs ┊ GC (min max): 0.00% 0.00%
168+
Time (median): 49.124 μs ┊ GC (median): 0.00%
169+
Time (mean ± σ): 49.306 μs ± 802.143 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
170+
171+
▁▃▅▆▇██▇██▇▇▆▅▄▂▂▁▁▁▂▁▁▁▁▁▁▁ ▁▁▁ ▃
172+
▃████████████████████████████████████▇▆▄▂▄▃▂▃▃▄▄▃▆▅▇▇▇██▇█▇▇ █
173+
48.3 μs Histogram: log(frequency) by time 53 μs <
174+
175+
Memory estimate: 0 bytes, allocs estimate: 0.
176+
177+
julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
178+
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
179+
Range (min max): 34.249 μs 40.208 μs ┊ GC (min max): 0.00% 0.00%
180+
Time (median): 34.375 μs ┊ GC (median): 0.00%
181+
Time (mean ± σ): 34.748 μs ± 774.675 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
182+
183+
▆██▆▃▄▅▃ ▁▁▄▅▅▃▂▁ ▂▃▂ ▁▂ ▂
184+
████████▁▁▃▁▁▁▁▁▃▄▃▁▁▃██████████▇▅▄▅▅▆▄▄▄▄▄▅▄▄▃▅▃▄▃▅█████▇██ █
185+
34.2 μs Histogram: log(frequency) by time 37.1 μs <
186+
187+
Memory estimate: 0 bytes, allocs estimate: 0.
188+
```
189+
Or
190+
```julia
191+
julia> @benchmark TriangularSolve.ldiv!($C, LowerTriangular($B), $A, Val(false)) # false means single threaded
192+
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
193+
Range (min max): 23.750 μs 30.541 μs ┊ GC (min max): 0.00% 0.00%
194+
Time (median): 23.875 μs ┊ GC (median): 0.00%
195+
Time (mean ± σ): 23.948 μs ± 316.293 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
196+
197+
▃▁▆ █ ▇▆▆ ▄ ▁ ▁ ▁ ▁ ▁ ▂
198+
▅███▆█▁███▄█▁██▇▁▄▁▁▁▁▁▃▁▁▁▁▁▁▁▃▁▁▁▃▁▁▁▁▁▆▁▇▆█▁█▁▇▆▅▁▅▁▇▆█▁█ █
199+
23.8 μs Histogram: log(frequency) by time 25 μs <
200+
201+
Memory estimate: 0 bytes, allocs estimate: 0.
202+
```
136203

137204
For editing convenience (you can copy/paste the above into a REPL and it should automatically strip `julia> `s and outputs, but the above is less convenient to edit if you want to try changing the benchmarks):
138205
```julia

0 commit comments

Comments
 (0)