fas is floating point arithmetic for arbitrary mantissa and exponent types in modern header-only C++. It lets you construct various different float types using template parameters for the mantissa, exponent and base.
The constructed float-types look and fell like a native float/double for arithmetic operations. Furthermore all methods are performed on the stack and do not require any heap space.
fas is header-only.
fas::Float<int16_t, int8_t> f;
This will result in a float using a signed 16 bit mantissa and a signed 8 bit exponent.
The arithmetical operations are supported using their corresponding operators
+ - * / += -= *= /= ++ --
fas::Float<int16_t, int8_t> f1 = 10;
auto f2 = f1;
f2 = -f1; // => -10
f2 = f1 + 10; // => 20
f2 = f1 - f1; // => 0
f2 = f1 * f1; // => 100
f2 = f1 / 20; // => 0.5
f2 += 1; // => 1.5
f2 -= 1; // => 0.5
f2 *= 2; // => 1
f2 /= 2; // => 0.5
f1++; // => 11
++f1; // => 12
f1--; // => 11
--f1; // => 10
Each type knows its boundaries:
MAX()
returns the largest valueMIN()
returns the smallest positive valueLOWEST()
returns the smallest value The naming is the same as with internal float/double.
There are overloadings for std::numeric_limits
available:
fas::Float<int8_t, int8_t>::MAX()
returns the same as
std::numeric_limits<fas::Float<std::int16_t, std::int8_t>>::MAX()
which is approx 2.16079e+40
.
fas supports type traits:
is_fundamental<int16_t, int16_t>::value // => true
is_floating_point<fas::Float<int16_t, int16_t>::value // => true
is_arithmetic<int16_t, int16_t>::value // => true
is_scalar<int16_t, int16_t>::value // => true
is_object<int16_t, int16_t>::value // => true
constexpr c = fas::Float<std::int16_t, std::int8_t>(1);
Even the operations' results:
constexpr c1 = fas::Float<std::int16_t, std::int8_t>(1);
constexpr c2 = fas::Float<std::int16_t, std::int8_t>(2);
constexpr c3 = c1 + c2; // => 3
fas offers an output stream overload, which can be used for std::cout
:
#include "fas/stream.hpp"
...
fas::Float<int8_t, int8_t> fas_float(1);
fas_float /= 3;
std::cout << fas_float << "\n"; // => 70 * 7 ^ fffffffd ≈ 0.32653061224489793313
Typical floats such as float
or double
are to the base of
2
. fas allows to construct floats to any base:
fas::Float<int8_t, int8_t, 7> fas_float(0);
double cpp_float = 0;
for(auto i=0; i<7; ++i) {
fas_float += fas::Float<int8_t, int8_t, 7>(1)/7;
cpp_float += 1.0/7;
}
std::cout << std::setprecision(20);
std::cout << fas_float << "\n"; // => exactly 1
std::cout << cpp_float << "\n"; // => 0.99999999999999977796
The setting of different bases allows to represent specific fractions exaclty.
In this case the base is 7
, so any fraction by 7
is represented exactly.
Compare to the native double
which is always to the base of 2
, thus can not
represent 1/7
exactly.
Download the float.hpp and include it in your source. Don't forget to add it in your include path.
To build and run unit tests type:
mkdir build; cd build; cmake ..; make && ./tests/tests
- The Stl's
std::numeric_limits
is required the limits of the specified types for mantissa and exponent. - Catch2 is required to build the unit tests.
- (configurable) rounding support
exp
,sqrt
,pow
and other math.h functions- string constructor
- string representation
- value constructor for all ints and other types
- a version without stl