Description
@cgeoga I was thinking some more about how we compute the region between asymptotic expansions (power series, uniform expansions, and large argument). Which corresponds to roughly 0<nu<25
and 2 < x < 30
...
Right now, we are using the wronskian relation to besseli
. The reason we can do this is that the power series of besseli
is convergent in this zone so using that along with the continued fraction and wronskian we can get besselk
pretty accurately. Now there are a couple drawbacks to that:
- We need
besseli(nu, x)
andbesseli(nu + 1, x)
and then also need to computebesselk(nu+1,x)/besselk(nu,x)
. - Power series are slower to converge as
x
gets bigger - The power series of
besseli(nu, x)
suffers from cancellation for complex numbers so can't be as uniformly used.
The first point was taken care of by using SIMD instructions to compute both besseli(nu, x)
and besseli(nu + 1, x)
. So that is pretty optimized (though we pay an extra penalty for complex numbers). But points 2 and 3 are pretty much unsolvable. Power series will always be slow for larger arguments and can't use much for complex numbers.
I was toying with an alternative which links a few ideas. Now the continued fraction approach to compute besselk(nu+1,x)/
besselk(nu,x)converges more quickly as
xgets bigger so it has an inverse relation to the power series. (i.e., power series converges faster for small
xwhile continued fraction converges faster for big
x`). So what if there was a good way to use the uniform expansions to link with the smaller orders. Now Miller's algorithm links stable backward recurrence to higher orders with a normalization factor calculated from low orders. That is fine but usually we do not know the lower orders unless for integer arguments.
I'm sure this idea has been described before but can't seem to find a description of it. But there appears to be a way to link stable forward recurrence with a normalization factor calculated at higher orders to compute besselk
at lower orders. It works by instead of considering trial start values (0 and epsilon) in the Miller backwards scheme, it uses the continued fraction approach to generate start values in the forward scheme. So the algorithm would look something like:
- Generate start values
1.0
andknup1/knu
. - Use upward recurrence up until a order you can use asymptotic expansion (
nu_ceil
). - Compute
besselk(nu_ceil, x)
using asymptotic expansion - Normalize starting value
It may seem complicated but instead of having to compute besseli(nu, x)
, besseli(nu + 1, x)
and besselk(nu+1,x)/
besselk(nu,x)we just need
besselk(nu_ceil, x)and
besselk(nu+1,x)/besselk(nu,x)
. There is a few extra steps of course to generate the recurrence but that is pretty small.
So now we have an algorithm that computes besselk(nu_ceil, x)
at constant time using uniform asymptotic expansion and algorithm to compute besselk(nu+1,x)/
besselk(nu,x)which converges faster as
x` increases. A sample implementation of the algorithm would look like..
using Bessels
function test(nu, x)
fpart, _ = modf(nu)
nu_ceil = 25.0 + fpart
knu_ceil = Bessels.besselk_large_orders(nu_ceil, x)
knup1 = Bessels.besselk_ratio_knu_knup1(nu, x)
return knu_ceil / Bessels.besselk_up_recurrence(x, knup1, 1.0, nu + 1, nu_ceil)[1]
end
Speed tests to other approach.
## high precision calculation
julia> 2.308550810254649099632E-14
2.308550810254649e-14
julia> test(2.2, 30.0)
2.3085508102546522e-14
julia> Bessels.besselk_continued_fraction(2.2, 30.0)
2.3085508102546462e-14
julia> @benchmark test(2.2, x) setup = (x = 30.0 + rand())
BenchmarkTools.Trial: 10000 samples with 968 evaluations.
Range (min … max): 79.846 ns … 120.610 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 79.976 ns ┊ GC (median): 0.00%
Time (mean ± σ): 80.832 ns ± 2.255 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▆█▃ ▂▃ ▃▅▁ ▁
███▆▅▅▄▃▄▄▄▄▆▄▄▃▁▁▄▃▄▁▆██▆▅▃▃▃▄▄▄▅▄▆███▅▄▅▃▃▁▃▁▄▃▁▄▃▃▁▃▃▁▁▃█ █
79.8 ns Histogram: log(frequency) by time 87.2 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark Bessels.besselk_continued_fraction(2.2, x) setup = (x = 30.0 + rand())
BenchmarkTools.Trial: 10000 samples with 898 evaluations.
Range (min … max): 127.180 ns … 178.360 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 129.315 ns ┊ GC (median): 0.00%
Time (mean ± σ): 128.840 ns ± 1.969 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▇▂ █▇
▃███▃▂▂▂▂▂▂▂▂▁▂▂▂███▄▃▂▂▂▂▂▃▃▃▄▃▂▂▂▂▂▂▁▂▂▂▂▃▃▂▂▂▂▂▁▁▂▂▂▂▂▂▂▂▂ ▃
127 ns Histogram: frequency by time 134 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
Which is about 40% faster for bigger arguments. For smallish arguments its about 10% faster.
julia> @benchmark Bessels.besselk_continued_fraction(2.2, x) setup = (x = 10.0 + rand())
BenchmarkTools.Trial: 10000 samples with 940 evaluations.
Range (min … max): 102.305 ns … 145.612 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 104.122 ns ┊ GC (median): 0.00%
Time (mean ± σ): 104.488 ns ± 2.483 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅▆ ▄█▅▂ ▁ ▅▄▃▁ ▁ ▂
██▇▁▁▁▁████▆▄▄█▇▅▅▄████▇▅▅▄▇█▇▆▆▇▇▅▄▁▃▄▅▅▆▅▆▅▆▆▅▄▅▄▄▃▄▄▅▅▃▄▄▄ █
102 ns Histogram: log(frequency) by time 115 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark test(2.2, x) setup = (x = 10.0 + rand())
BenchmarkTools.Trial: 10000 samples with 954 evaluations.
Range (min … max): 93.203 ns … 134.565 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 97.615 ns ┊ GC (median): 0.00%
Time (mean ± σ): 96.402 ns ± 2.687 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▇▅ ▁ █▅ ▁ ▁▃ ▂
██▇▅▄▅▄▅▅▄▄▁▁▃▄▅█▆▅▄▁▁▁███▇▅▅▅▅▇▆▆▅▁▄▁▃▇█▇▆▅▅▄▃▅▁▅██▆▃▄▁▄▁▄▃ █
93.2 ns Histogram: log(frequency) by time 104 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
And for much smaller arguments it's hard to beat the power series so it is slower.
julia> @benchmark test(2.2, x) setup = (x = 4.0 + rand())
BenchmarkTools.Trial: 10000 samples with 918 evaluations.
Range (min … max): 115.423 ns … 157.588 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 121.642 ns ┊ GC (median): 0.00%
Time (mean ± σ): 122.518 ns ± 4.119 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▆ █ ▂█ ▆ ▃ ▂
█▇▃▄▄▆▃▃▁▁▃▁▁▁▇██▅▃▆█▆▄▄▄▃▃▄▁███▄▁▅█▆▄▃▄▁▃▃▄▅██▃▄▄▅▅▃▁▃▁▄▁▃▁█ █
115 ns Histogram: log(frequency) by time 133 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark Bessels.besselk_continued_fraction(2.2, x) setup = (x = 4.0 + rand())
BenchmarkTools.Trial: 10000 samples with 942 evaluations.
Range (min … max): 100.849 ns … 158.882 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 108.678 ns ┊ GC (median): 0.00%
Time (mean ± σ): 108.736 ns ± 4.641 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄ ▅ █▂ ▅▂ ▂▆ ▇▄ ▃ ▂▅ ▅▃ ▁ ▃ ▃ ▂
█▄▁▄▇█▅▁▆▃▃▄██▆▆██▄▄███▅██▆▆██▆▄██▇▆██▅▆▅█▆▅██▇▅▆█▅▅▄▇▄▄▄██▁█ █
101 ns Histogram: log(frequency) by time 123 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
So there is of course things to consider but this approach has the advantage of being more applicable to a large parameter set including complex numbers and has the potential for large speed gains over a fairly big domain. My initial implementation could also probably be improved to speed things up a bit but it does serve as a good way to link directly to the asymptotic expansions.