Skip to content

A different approach to compute besselk in intermediate zones #74

Open
@heltonmc

Description

@heltonmc

@cgeoga I was thinking some more about how we compute the region between asymptotic expansions (power series, uniform expansions, and large argument). Which corresponds to roughly 0<nu<25 and 2 < x < 30...

Right now, we are using the wronskian relation to besseli. The reason we can do this is that the power series of besseli is convergent in this zone so using that along with the continued fraction and wronskian we can get besselk pretty accurately. Now there are a couple drawbacks to that:

  1. We need besseli(nu, x) and besseli(nu + 1, x) and then also need to compute besselk(nu+1,x)/besselk(nu,x).
  2. Power series are slower to converge as x gets bigger
  3. The power series of besseli(nu, x) suffers from cancellation for complex numbers so can't be as uniformly used.

The first point was taken care of by using SIMD instructions to compute both besseli(nu, x) and besseli(nu + 1, x). So that is pretty optimized (though we pay an extra penalty for complex numbers). But points 2 and 3 are pretty much unsolvable. Power series will always be slow for larger arguments and can't use much for complex numbers.

I was toying with an alternative which links a few ideas. Now the continued fraction approach to compute besselk(nu+1,x)/besselk(nu,x)converges more quickly asxgets bigger so it has an inverse relation to the power series. (i.e., power series converges faster for smallxwhile continued fraction converges faster for bigx`). So what if there was a good way to use the uniform expansions to link with the smaller orders. Now Miller's algorithm links stable backward recurrence to higher orders with a normalization factor calculated from low orders. That is fine but usually we do not know the lower orders unless for integer arguments.

I'm sure this idea has been described before but can't seem to find a description of it. But there appears to be a way to link stable forward recurrence with a normalization factor calculated at higher orders to compute besselk at lower orders. It works by instead of considering trial start values (0 and epsilon) in the Miller backwards scheme, it uses the continued fraction approach to generate start values in the forward scheme. So the algorithm would look something like:

  1. Generate start values 1.0 and knup1/knu.
  2. Use upward recurrence up until a order you can use asymptotic expansion (nu_ceil).
  3. Compute besselk(nu_ceil, x) using asymptotic expansion
  4. Normalize starting value

It may seem complicated but instead of having to compute besseli(nu, x), besseli(nu + 1, x) and besselk(nu+1,x)/besselk(nu,x)we just needbesselk(nu_ceil, x)andbesselk(nu+1,x)/besselk(nu,x). There is a few extra steps of course to generate the recurrence but that is pretty small.

So now we have an algorithm that computes besselk(nu_ceil, x) at constant time using uniform asymptotic expansion and algorithm to compute besselk(nu+1,x)/besselk(nu,x)which converges faster asx` increases. A sample implementation of the algorithm would look like..

using Bessels

function test(nu, x)
    fpart, _ = modf(nu)
    nu_ceil = 25.0 + fpart
    knu_ceil = Bessels.besselk_large_orders(nu_ceil, x)
    knup1 = Bessels.besselk_ratio_knu_knup1(nu, x)
    return knu_ceil / Bessels.besselk_up_recurrence(x, knup1, 1.0, nu + 1, nu_ceil)[1]
end

Speed tests to other approach.

## high precision calculation
julia> 2.308550810254649099632E-14
2.308550810254649e-14

julia> test(2.2, 30.0)
2.3085508102546522e-14

julia> Bessels.besselk_continued_fraction(2.2, 30.0)
2.3085508102546462e-14

julia> @benchmark test(2.2, x) setup = (x = 30.0 + rand())
BenchmarkTools.Trial: 10000 samples with 968 evaluations.
 Range (min  max):  79.846 ns  120.610 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     79.976 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   80.832 ns ±   2.255 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆█▃                    ▂▃           ▃▅▁                      ▁
  ███▆▅▅▄▃▄▄▄▄▆▄▄▃▁▁▄▃▄▁▆██▆▅▃▃▃▄▄▄▅▄▆███▅▄▅▃▃▁▃▁▄▃▁▄▃▃▁▃▃▁▁▃█ █
  79.8 ns       Histogram: log(frequency) by time      87.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Bessels.besselk_continued_fraction(2.2, x) setup = (x = 30.0 + rand())
BenchmarkTools.Trial: 10000 samples with 898 evaluations.
 Range (min  max):  127.180 ns  178.360 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     129.315 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   128.840 ns ±   1.969 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █▇▂              █▇                                           
  ▃███▃▂▂▂▂▂▂▂▂▁▂▂▂███▄▃▂▂▂▂▂▃▃▃▄▃▂▂▂▂▂▂▁▂▂▂▂▃▃▂▂▂▂▂▁▁▂▂▂▂▂▂▂▂▂ ▃
  127 ns           Histogram: frequency by time          134 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Which is about 40% faster for bigger arguments. For smallish arguments its about 10% faster.

julia> @benchmark Bessels.besselk_continued_fraction(2.2, x) setup = (x = 10.0 + rand())
BenchmarkTools.Trial: 10000 samples with 940 evaluations.
 Range (min  max):  102.305 ns  145.612 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     104.122 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   104.488 ns ±   2.483 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅▆     ▄█▅▂   ▁    ▅▄▃▁     ▁                                 ▂
  ██▇▁▁▁▁████▆▄▄█▇▅▅▄████▇▅▅▄▇█▇▆▆▇▇▅▄▁▃▄▅▅▆▅▆▅▆▆▅▄▅▄▄▃▄▄▅▅▃▄▄▄ █
  102 ns        Histogram: log(frequency) by time        115 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark test(2.2, x) setup = (x = 10.0 + rand())
BenchmarkTools.Trial: 10000 samples with 954 evaluations.
 Range (min  max):  93.203 ns  134.565 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     97.615 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   96.402 ns ±   2.687 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇▅              ▁       █▅              ▁         ▁▃         ▂
  ██▇▅▄▅▄▅▅▄▄▁▁▃▄▅█▆▅▄▁▁▁███▇▅▅▅▅▇▆▆▅▁▄▁▃▇█▇▆▅▅▄▃▅▁▅██▆▃▄▁▄▁▄▃ █
  93.2 ns       Histogram: log(frequency) by time       104 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

And for much smaller arguments it's hard to beat the power series so it is slower.

julia> @benchmark test(2.2, x) setup = (x = 4.0 + rand())
BenchmarkTools.Trial: 10000 samples with 918 evaluations.
 Range (min  max):  115.423 ns  157.588 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     121.642 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   122.518 ns ±   4.119 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆              █             ▂█              ▆              ▃ ▂
  █▇▃▄▄▆▃▃▁▁▃▁▁▁▇██▅▃▆█▆▄▄▄▃▃▄▁███▄▁▅█▆▄▃▄▁▃▃▄▅██▃▄▄▅▅▃▁▃▁▄▁▃▁█ █
  115 ns        Histogram: log(frequency) by time        133 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Bessels.besselk_continued_fraction(2.2, x) setup = (x = 4.0 + rand())
BenchmarkTools.Trial: 10000 samples with 942 evaluations.
 Range (min  max):  100.849 ns  158.882 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     108.678 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   108.736 ns ±   4.641 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄    ▅      █▂  ▅▂  ▂▆  ▇▄   ▃  ▂▅  ▅▃   ▁   ▃   ▃            ▂
  █▄▁▄▇█▅▁▆▃▃▄██▆▆██▄▄███▅██▆▆██▆▄██▇▆██▅▆▅█▆▅██▇▅▆█▅▅▄▇▄▄▄██▁█ █
  101 ns        Histogram: log(frequency) by time        123 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

So there is of course things to consider but this approach has the advantage of being more applicable to a large parameter set including complex numbers and has the potential for large speed gains over a fairly big domain. My initial implementation could also probably be improved to speed things up a bit but it does serve as a good way to link directly to the asymptotic expansions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions