Skip to content

[big feature] Implement core.math.pow() for ^^ operator #14297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ibuclaw
Copy link
Member

@ibuclaw ibuclaw commented Jul 12, 2022

Adds two new functions for the power operator:

    typeof(X * Y) pow(X, Y)
	if (IntOrFloat!X && Int!Y);

    typeof(X * Y) pow(X, Y)
	if (IntOrFloat!X && Float!Y);

Lowers x ^^ y operator to core.math.pow(x, y).

Self assessment score:

  • 2/10 readability

  • 0/10 maintainability

  • 10/10 speed (so long as you're using 64-bit doubles)

  • 42/10 compiler+druntime changed together in one PR.

Implementation

The integral power function is pretty much taken from the std.math template. Where it differs is rather than having two separate templates for pow(int, int) and pow(float, int), these have been merged into one pow(int_or_float, int) function.

The floating point power function similarly merges the pow(int, float) and pow(float, float) templates into one pow(int_or_float, float) function. It is otherwise a complete reworking, as the std.math implementation fails to deliver on two fronts:

  1. It computes at real precision only.
  2. It implements pow as sign * exp2(y * log2(x)) or (sign * exp2(fyl2x(x, y))), which is not accurate in practice.

The new implementation is split up into two (or three) different algorithms:

  • The 64-bit double and real (32-bit floats are forced at double precision) path is a port of the widely used algorithm described in "Fast evaluation of elementary mathematical functions with correctly rounded last bit" by Ziv 1991 (LGPL).
  • The 80-bit and 128-bit real path is a port of (another widely used) algorithm in Cephes by Stephen L. Moshier, 1984-1992.
  • The 128-bit IBM real path is a variant on the Cephes path, but with different overflow constraints and coefficients to work with the weird format. (This is still TODO/WIP)

Unittesting

The unittests in std.math are OK, though have an affinity towards real precision testing. Most can be hoisted into core.math and tweaked to validate all types, i.e: foreach (T; AliasSeq!(float, double, real)).

Note: some tests in std.math are just plain wrong, thankfully they only concern the special case handling, i.e pow(1, nan) == 1 (std.math asserts the result is nan).

Benchmarking

Naive test only confirms what we already know about x87 speeds (note, std.math.pow only computes at real precision)

{
    import core.math;
    pragma(inline, false)
    void f1() { cast(void)pow(123, 789); }
    pragma(inline, false)
    void f2() { cast(void)pow(123.456, 789); }
    pragma(inline, false)
    void f3() { cast(void)pow(123, 789.012); }
    pragma(inline, false)
    void f4() { cast(void)pow(123.456, 789.012); }
    pragma(inline, false)
    void f5() { cast(void)pow(123.456L, 789); }
    pragma(inline, false)
    void f6() { cast(void)pow(123, 789.012L); }
    pragma(inline, false)
    void f7() { cast(void)pow(123.456L, 789.012L); }
    import std.math : stdpow = pow;
    pragma(inline, false)
    void f8() { cast(void)stdpow(123, 789); }
    pragma(inline, false)
    void f9() { cast(void)stdpow(123.456, 789); }
    pragma(inline, false)
    void f10() { cast(void)stdpow(123, 789.012); }
    pragma(inline, false)
    void f11() { cast(void)stdpow(123.456, 789.012); }

    import std.datetime.stopwatch;
    auto r = benchmark!(f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11)(100_000_000);
    import std.stdio;
    writeln("corepow(int, int)      : ", r[0]);
    writeln("corepow(double, int)   : ", r[1]);
    writeln("corepow(int, double)   : ", r[2]);
    writeln("corepow(double, double): ", r[3]);
    writeln("corepow(real, int)     : ", r[4]);
    writeln("corepow(int, real)     : ", r[5]);
    writeln("corepow(real, real)    : ", r[6]);
    writeln("stdpow(int, int)       : ", r[7]);
    writeln("stdpow(double, int)    : ", r[8]);
    writeln("stdpow(int, double)    : ", r[9]);
    writeln("stdpow(double, double) : ", r[10]);
}
__EOF__
corepow(int, int)      : 656 ms, 103 μs, and 7 hnsecs
corepow(double, int)   : 636 ms, 773 μs, and 3 hnsecs
corepow(int, double)   : 963 ms, 588 μs, and 2 hnsecs
corepow(double, double): 960 ms, 462 μs, and 2 hnsecs
corepow(real, int)     : 769 ms and 272 μs
corepow(int, real)     : 19 secs, 487 ms, 782 μs, and 4 hnsecs
corepow(real, real)    : 19 secs, 665 ms, 617 μs, and 1 hnsec
stdpow(int, int)       : 754 ms, 980 μs, and 8 hnsecs
stdpow(double, int)    : 785 ms, 29 μs, and 1 hnsec
stdpow(int, double)    : 16 secs, 560 ms, 262 μs, and 7 hnsecs
stdpow(double, double) : 20 secs, 161 ms, 376 μs, and 3 hnsecs

Tested on:

  • x86_64-Linux-gnu (LE, 80-bit)
  • powerpc64-linux-gnu (BE, ibm)
  • aarch64-linux-gnu (LE, 128-bit)
  • i686-freebsd13 (LE, 80-bit rounded)
  • sparc64-sun-solaris2.11 (BE, 128-bit)
  • mips-linux-gnu (LE, 64-bit)

@ibuclaw ibuclaw added Review:Needs Changelog A changelog entry needs to be added to /changelog Review:Needs Tests labels Jul 12, 2022
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @ibuclaw!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#14297"

@ibuclaw ibuclaw changed the title [druntime] core.math: Implement pow() for ^^ operator [big feature] Implement core.math.pow() for ^^ operator Jul 12, 2022
@ibuclaw ibuclaw force-pushed the corepow branch 3 times, most recently from 50efc14 to eb78d88 Compare July 13, 2022 00:01
immutable t2 = t1 + r;
immutable lo1 = k * ln2lo + logctail;
immutable lo2 = t1 - t2 + r;
// Evaluation is optimized assuming superscalar pipelined execution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple issue? All processors are pipelined in some way and basically anything is superscalar now.

@maxhaton
Copy link
Member

Adds two new functions for the power operator:

    typeof(X * Y) pow(X, Y)
	if (IntOrFloat!X && Int!Y);

    typeof(X * Y) pow(X, Y)
	if (IntOrFloat!X && Float!Y);

Lowers x ^^ y operator to core.math.pow(x, y).

Self assessment score:

* **2/10** readability

* **0/10** maintainability

* **10/10** speed (so long as you're using 64-bit doubles)
corepow(int, int)      : 656 ms, 103 μs, and 7 hnsecs
corepow(double, int)   : 636 ms, 773 μs, and 3 hnsecs
corepow(int, double)   : 963 ms, 588 μs, and 2 hnsecs
corepow(double, double): 960 ms, 462 μs, and 2 hnsecs
corepow(real, int)     : 769 ms and 272 μs
corepow(int, real)     : 19 secs, 487 ms, 782 μs, and 4 hnsecs
corepow(real, real)    : 19 secs, 665 ms, 617 μs, and 1 hnsec
stdpow(int, int)       : 754 ms, 980 μs, and 8 hnsecs
stdpow(double, int)    : 785 ms, 29 μs, and 1 hnsec
stdpow(int, double)    : 16 secs, 560 ms, 262 μs, and 7 hnsecs
stdpow(double, double) : 20 secs, 161 ms, 376 μs, and 3 hnsecs
* **42/10** compiler+druntime changed together in one PR.

over what test program/benchmark + compiler settings etc.

// x = the value to evaluate
// A = array of coefficients
pragma(inline, true)
static real poly_inline(alias A)(real x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these functions could probably be put inside a template somewhere else so there's no risk of having a bunch of them in the binary N times over.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm just making it work first, then making it pretty later. :-)

Yet to decide on whether to (A) make them private, or shunt them off to core.internal.math, and (B) keep them as inline templates or (possibly inline) plain functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: all these _inline functions do not handle non-finite (and in some cases subnormal) input values because those are assumed to be taken care of by the "special cases" handling of outer pow() function.

ibuclaw added 2 commits August 4, 2022 10:00
Adds two new functions for the power operator:

    typeof(X * Y) pow(X, Y)
	if (IntOrFloat!X && Int!Y);

    typeof(X * Y) pow(X, Y)
	if (IntOrFloat!X && Float!Y);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants