From 5aa597d9e0c8b2aba947742ebccf3fa674b64f6d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ingo=20M=C3=BCller?= Date: Mon, 6 Jan 2025 14:09:49 +0000 Subject: [PATCH] xxx-remove --- .../extensions/functions_aggregate_approx.md | 18 + .../functions_aggregate_decimal_output.md | 39 + .../extensions/functions_aggregate_generic.md | 47 + site/docs/extensions/functions_arithmetic.md | 885 ++++++++++++++++++ .../functions_arithmetic_decimal.md | 271 ++++++ site/docs/extensions/functions_boolean.md | 122 +++ site/docs/extensions/functions_comparison.md | 206 ++++ site/docs/extensions/functions_datetime.md | 352 +++++++ site/docs/extensions/functions_geometry.md | 195 ++++ site/docs/extensions/functions_logarithmic.md | 114 +++ site/docs/extensions/functions_rounding.md | 56 ++ site/docs/extensions/functions_set.md | 30 + site/docs/extensions/functions_string.md | 686 ++++++++++++++ 13 files changed, 3021 insertions(+) create mode 100644 site/docs/extensions/functions_aggregate_approx.md create mode 100644 site/docs/extensions/functions_aggregate_decimal_output.md create mode 100644 site/docs/extensions/functions_aggregate_generic.md create mode 100644 site/docs/extensions/functions_arithmetic.md create mode 100644 site/docs/extensions/functions_arithmetic_decimal.md create mode 100644 site/docs/extensions/functions_boolean.md create mode 100644 site/docs/extensions/functions_comparison.md create mode 100644 site/docs/extensions/functions_datetime.md create mode 100644 site/docs/extensions/functions_geometry.md create mode 100644 site/docs/extensions/functions_logarithmic.md create mode 100644 site/docs/extensions/functions_rounding.md create mode 100644 site/docs/extensions/functions_set.md create mode 100644 site/docs/extensions/functions_string.md diff --git a/site/docs/extensions/functions_aggregate_approx.md b/site/docs/extensions/functions_aggregate_approx.md new file mode 100644 index 000000000..adf11c9c7 --- /dev/null +++ b/site/docs/extensions/functions_aggregate_approx.md @@ -0,0 +1,18 @@ + + + + +# functions_aggregate_approx.yaml + + +This document file is generated for [functions_aggregate_approx.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_aggregate_approx.yaml) +## Aggregate Functions + +### approx_count_distinct + + +Implementations: +approx_count_distinct(`x`): -> `return_type` +0. approx_count_distinct(`any`): -> `i64` + +*Calculates the approximate number of rows that contain distinct values of the expression argument using HyperLogLog. This function provides an alternative to the COUNT (DISTINCT expression) function, which returns the exact number of rows that contain distinct values of an expression. APPROX_COUNT_DISTINCT processes large amounts of data significantly faster than COUNT, with negligible deviation from the exact result.* \ No newline at end of file diff --git a/site/docs/extensions/functions_aggregate_decimal_output.md b/site/docs/extensions/functions_aggregate_decimal_output.md new file mode 100644 index 000000000..9277fd0b2 --- /dev/null +++ b/site/docs/extensions/functions_aggregate_decimal_output.md @@ -0,0 +1,39 @@ + + + + +# functions_aggregate_decimal_output.yaml + + +This document file is generated for [functions_aggregate_decimal_output.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_aggregate_decimal_output.yaml) +## Aggregate Functions + +### count + + +Implementations: +count(`x`, `option:overflow`): -> `return_type` +0. count(`any`, `option:overflow`): -> `decimal<38,0>` + +*Count a set of values. Result is returned as a decimal instead of i64.* + +
Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### count + + +Implementations: + +*Count a set of records (not field referenced). Result is returned as a decimal instead of i64.* +### approx_count_distinct + + +Implementations: +approx_count_distinct(`x`): -> `return_type` +0. approx_count_distinct(`any`): -> `decimal<38,0>` + +*Calculates the approximate number of rows that contain distinct values of the expression argument using HyperLogLog. This function provides an alternative to the COUNT (DISTINCT expression) function, which returns the exact number of rows that contain distinct values of an expression. APPROX_COUNT_DISTINCT processes large amounts of data significantly faster than COUNT, with negligible deviation from the exact result. Result is returned as a decimal instead of i64.* \ No newline at end of file diff --git a/site/docs/extensions/functions_aggregate_generic.md b/site/docs/extensions/functions_aggregate_generic.md new file mode 100644 index 000000000..b06552d04 --- /dev/null +++ b/site/docs/extensions/functions_aggregate_generic.md @@ -0,0 +1,47 @@ + + + + +# functions_aggregate_generic.yaml + + +This document file is generated for [functions_aggregate_generic.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_aggregate_generic.yaml) +## Aggregate Functions + +### count + + +Implementations: +count(`x`, `option:overflow`): -> `return_type` +0. count(`any`, `option:overflow`): -> `i64` + +*Count a set of values* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### count + + +Implementations: + +*Count a set of records (not field referenced)* +### any_value + + +Implementations: +any_value(`x`, `option:ignore_nulls`): -> `return_type` +0. any_value(`any1`, `option:ignore_nulls`): -> `any1?` + +*Selects an arbitrary value from a group of values. +If the input is empty, the function returns null. +* + +
    Options: + +
  • ignore_nulls ['TRUE', 'FALSE']
  • + +
    diff --git a/site/docs/extensions/functions_arithmetic.md b/site/docs/extensions/functions_arithmetic.md new file mode 100644 index 000000000..911242c72 --- /dev/null +++ b/site/docs/extensions/functions_arithmetic.md @@ -0,0 +1,885 @@ + + + + +# functions_arithmetic.yaml + + +This document file is generated for [functions_arithmetic.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_arithmetic.yaml) +## Scalar Functions + +### add + + +Implementations: +add(`x`, `y`, `option:overflow`): -> `return_type` +0. add(`i8`, `i8`, `option:overflow`): -> `i8` +1. add(`i16`, `i16`, `option:overflow`): -> `i16` +2. add(`i32`, `i32`, `option:overflow`): -> `i32` +3. add(`i64`, `i64`, `option:overflow`): -> `i64` +4. add(`fp32`, `fp32`, `option:rounding`): -> `fp32` +5. add(`fp64`, `fp64`, `option:rounding`): -> `fp64` + +*Add two values.* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### subtract + + +Implementations: +subtract(`x`, `y`, `option:overflow`): -> `return_type` +0. subtract(`i8`, `i8`, `option:overflow`): -> `i8` +1. subtract(`i16`, `i16`, `option:overflow`): -> `i16` +2. subtract(`i32`, `i32`, `option:overflow`): -> `i32` +3. subtract(`i64`, `i64`, `option:overflow`): -> `i64` +4. subtract(`fp32`, `fp32`, `option:rounding`): -> `fp32` +5. subtract(`fp64`, `fp64`, `option:rounding`): -> `fp64` + +*Subtract one value from another.* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### multiply + + +Implementations: +multiply(`x`, `y`, `option:overflow`): -> `return_type` +0. multiply(`i8`, `i8`, `option:overflow`): -> `i8` +1. multiply(`i16`, `i16`, `option:overflow`): -> `i16` +2. multiply(`i32`, `i32`, `option:overflow`): -> `i32` +3. multiply(`i64`, `i64`, `option:overflow`): -> `i64` +4. multiply(`fp32`, `fp32`, `option:rounding`): -> `fp32` +5. multiply(`fp64`, `fp64`, `option:rounding`): -> `fp64` + +*Multiply two values.* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### divide + + +Implementations: +divide(`x`, `y`, `option:overflow`, `option:on_domain_error`, `option:on_division_by_zero`): -> `return_type` +0. divide(`i8`, `i8`, `option:overflow`, `option:on_domain_error`, `option:on_division_by_zero`): -> `i8` +1. divide(`i16`, `i16`, `option:overflow`, `option:on_domain_error`, `option:on_division_by_zero`): -> `i16` +2. divide(`i32`, `i32`, `option:overflow`, `option:on_domain_error`, `option:on_division_by_zero`): -> `i32` +3. divide(`i64`, `i64`, `option:overflow`, `option:on_domain_error`, `option:on_division_by_zero`): -> `i64` +4. divide(`fp32`, `fp32`, `option:rounding`, `option:on_domain_error`, `option:on_division_by_zero`): -> `fp32` +5. divide(`fp64`, `fp64`, `option:rounding`, `option:on_domain_error`, `option:on_division_by_zero`): -> `fp64` + +*Divide x by y. In the case of integer division, partial values are truncated (i.e. rounded towards 0). The `on_division_by_zero` option governs behavior in cases where y is 0. If the option is IEEE then the IEEE754 standard is followed: all values except +/-infinity return NaN and +/-infinity are unchanged. If the option is LIMIT then the result is +/-infinity in all cases. If either x or y are NaN then behavior will be governed by `on_domain_error`. If x and y are both +/-infinity, behavior will be governed by `on_domain_error`. +* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • +
  • on_domain_error ['NULL', 'ERROR']
  • +
  • on_division_by_zero ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • rounding ['NAN', 'NULL', 'ERROR']
  • +
  • overflow ['IEEE', 'LIMIT', 'NULL', 'ERROR']
  • + +
    + +### negate + + +Implementations: +negate(`x`, `option:overflow`): -> `return_type` +0. negate(`i8`, `option:overflow`): -> `i8` +1. negate(`i16`, `option:overflow`): -> `i16` +2. negate(`i32`, `option:overflow`): -> `i32` +3. negate(`i64`, `option:overflow`): -> `i64` +4. negate(`fp32`): -> `fp32` +5. negate(`fp64`): -> `fp64` + +*Negation of the value* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### modulus + + +Implementations: +modulus(`x`, `y`, `option:division_type`, `option:overflow`, `option:on_domain_error`): -> `return_type` +0. modulus(`i8`, `i8`, `option:division_type`, `option:overflow`, `option:on_domain_error`): -> `i8` +1. modulus(`i16`, `i16`, `option:division_type`, `option:overflow`, `option:on_domain_error`): -> `i16` +2. modulus(`i32`, `i32`, `option:division_type`, `option:overflow`, `option:on_domain_error`): -> `i32` +3. modulus(`i64`, `i64`, `option:division_type`, `option:overflow`, `option:on_domain_error`): -> `i64` + +*Calculate the remainder (r) when dividing dividend (x) by divisor (y). +In mathematics, many conventions for the modulus (mod) operation exists. The result of a mod operation depends on the software implementation and underlying hardware. Substrait is a format for describing compute operations on structured data and designed for interoperability. Therefore the user is responsible for determining a definition of division as defined by the quotient (q). +The following basic conditions of division are satisfied: (1) q ∈ ℤ (the quotient is an integer) (2) x = y * q + r (division rule) (3) abs(r) < abs(y) where q is the quotient. +The `division_type` option determines the mathematical definition of quotient to use in the above definition of division. +When `division_type`=TRUNCATE, q = trunc(x/y). When `division_type`=FLOOR, q = floor(x/y). +In the cases of TRUNCATE and FLOOR division: remainder r = x - round_func(x/y) +The `on_domain_error` option governs behavior in cases where y is 0, y is +/-inf, or x is +/-inf. In these cases the mod is undefined. The `overflow` option governs behavior when integer overflow occurs. If x and y are both 0 or both +/-infinity, behavior will be governed by `on_domain_error`. +* + +
    Options: + +
  • division_type ['TRUNCATE', 'FLOOR']
  • +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • +
  • on_domain_error ['NULL', 'ERROR']
  • + +
    + +### power + + +Implementations: +power(`x`, `y`, `option:overflow`): -> `return_type` +0. power(`i64`, `i64`, `option:overflow`): -> `i64` +1. power(`fp32`, `fp32`): -> `fp32` +2. power(`fp64`, `fp64`): -> `fp64` + +*Take the power with x as the base and y as exponent.* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### sqrt + + +Implementations: +sqrt(`x`, `option:rounding`, `option:on_domain_error`): -> `return_type` +0. sqrt(`i64`, `option:rounding`, `option:on_domain_error`): -> `fp64` +1. sqrt(`fp32`, `option:rounding`, `option:on_domain_error`): -> `fp32` +2. sqrt(`fp64`, `option:rounding`, `option:on_domain_error`): -> `fp64` + +*Square root of the value* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • on_domain_error ['NAN', 'ERROR']
  • + +
    + +### exp + + +Implementations: +exp(`x`, `option:rounding`): -> `return_type` +0. exp(`i64`, `option:rounding`): -> `fp64` +1. exp(`fp32`, `option:rounding`): -> `fp32` +2. exp(`fp64`, `option:rounding`): -> `fp64` + +*The mathematical constant e, raised to the power of the value.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### cos + + +Implementations: +cos(`x`, `option:rounding`): -> `return_type` +0. cos(`fp32`, `option:rounding`): -> `fp32` +1. cos(`fp64`, `option:rounding`): -> `fp64` + +*Get the cosine of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### sin + + +Implementations: +sin(`x`, `option:rounding`): -> `return_type` +0. sin(`fp32`, `option:rounding`): -> `fp32` +1. sin(`fp64`, `option:rounding`): -> `fp64` + +*Get the sine of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### tan + + +Implementations: +tan(`x`, `option:rounding`): -> `return_type` +0. tan(`fp32`, `option:rounding`): -> `fp32` +1. tan(`fp64`, `option:rounding`): -> `fp64` + +*Get the tangent of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### cosh + + +Implementations: +cosh(`x`, `option:rounding`): -> `return_type` +0. cosh(`fp32`, `option:rounding`): -> `fp32` +1. cosh(`fp64`, `option:rounding`): -> `fp64` + +*Get the hyperbolic cosine of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### sinh + + +Implementations: +sinh(`x`, `option:rounding`): -> `return_type` +0. sinh(`fp32`, `option:rounding`): -> `fp32` +1. sinh(`fp64`, `option:rounding`): -> `fp64` + +*Get the hyperbolic sine of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### tanh + + +Implementations: +tanh(`x`, `option:rounding`): -> `return_type` +0. tanh(`fp32`, `option:rounding`): -> `fp32` +1. tanh(`fp64`, `option:rounding`): -> `fp64` + +*Get the hyperbolic tangent of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### acos + + +Implementations: +acos(`x`, `option:rounding`, `option:on_domain_error`): -> `return_type` +0. acos(`fp32`, `option:rounding`, `option:on_domain_error`): -> `fp32` +1. acos(`fp64`, `option:rounding`, `option:on_domain_error`): -> `fp64` + +*Get the arccosine of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • on_domain_error ['NAN', 'ERROR']
  • + +
    + +### asin + + +Implementations: +asin(`x`, `option:rounding`, `option:on_domain_error`): -> `return_type` +0. asin(`fp32`, `option:rounding`, `option:on_domain_error`): -> `fp32` +1. asin(`fp64`, `option:rounding`, `option:on_domain_error`): -> `fp64` + +*Get the arcsine of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • on_domain_error ['NAN', 'ERROR']
  • + +
    + +### atan + + +Implementations: +atan(`x`, `option:rounding`): -> `return_type` +0. atan(`fp32`, `option:rounding`): -> `fp32` +1. atan(`fp64`, `option:rounding`): -> `fp64` + +*Get the arctangent of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### acosh + + +Implementations: +acosh(`x`, `option:rounding`, `option:on_domain_error`): -> `return_type` +0. acosh(`fp32`, `option:rounding`, `option:on_domain_error`): -> `fp32` +1. acosh(`fp64`, `option:rounding`, `option:on_domain_error`): -> `fp64` + +*Get the hyperbolic arccosine of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • on_domain_error ['NAN', 'ERROR']
  • + +
    + +### asinh + + +Implementations: +asinh(`x`, `option:rounding`): -> `return_type` +0. asinh(`fp32`, `option:rounding`): -> `fp32` +1. asinh(`fp64`, `option:rounding`): -> `fp64` + +*Get the hyperbolic arcsine of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### atanh + + +Implementations: +atanh(`x`, `option:rounding`, `option:on_domain_error`): -> `return_type` +0. atanh(`fp32`, `option:rounding`, `option:on_domain_error`): -> `fp32` +1. atanh(`fp64`, `option:rounding`, `option:on_domain_error`): -> `fp64` + +*Get the hyperbolic arctangent of a value in radians.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • on_domain_error ['NAN', 'ERROR']
  • + +
    + +### atan2 + + +Implementations: +atan2(`x`, `y`, `option:rounding`, `option:on_domain_error`): -> `return_type` +0. atan2(`fp32`, `fp32`, `option:rounding`, `option:on_domain_error`): -> `fp32` +1. atan2(`fp64`, `fp64`, `option:rounding`, `option:on_domain_error`): -> `fp64` + +*Get the arctangent of values given as x/y pairs.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • on_domain_error ['NAN', 'ERROR']
  • + +
    + +### radians + + +Implementations: +radians(`x`, `option:rounding`): -> `return_type` +0. radians(`fp32`, `option:rounding`): -> `fp32` +1. radians(`fp64`, `option:rounding`): -> `fp64` + +*Converts angle `x` in degrees to radians. +* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### degrees + + +Implementations: +degrees(`x`, `option:rounding`): -> `return_type` +0. degrees(`fp32`, `option:rounding`): -> `fp32` +1. degrees(`fp64`, `option:rounding`): -> `fp64` + +*Converts angle `x` in radians to degrees. +* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### abs + + +Implementations: +abs(`x`, `option:overflow`): -> `return_type` +0. abs(`i8`, `option:overflow`): -> `i8` +1. abs(`i16`, `option:overflow`): -> `i16` +2. abs(`i32`, `option:overflow`): -> `i32` +3. abs(`i64`, `option:overflow`): -> `i64` +4. abs(`fp32`): -> `fp32` +5. abs(`fp64`): -> `fp64` + +*Calculate the absolute value of the argument. +Integer values allow the specification of overflow behavior to handle the unevenness of the twos complement, e.g. Int8 range [-128 : 127]. +* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### sign + + +Implementations: +sign(`x`): -> `return_type` +0. sign(`i8`): -> `i8` +1. sign(`i16`): -> `i16` +2. sign(`i32`): -> `i32` +3. sign(`i64`): -> `i64` +4. sign(`fp32`): -> `fp32` +5. sign(`fp64`): -> `fp64` + +*Return the signedness of the argument. +Integer values return signedness with the same type as the input. Possible return values are [-1, 0, 1] +Floating point values return signedness with the same type as the input. Possible return values are [-1.0, -0.0, 0.0, 1.0, NaN] +* +### factorial + + +Implementations: +factorial(`n`, `option:overflow`): -> `return_type` +0. factorial(`i32`, `option:overflow`): -> `i32` +1. factorial(`i64`, `option:overflow`): -> `i64` + +*Return the factorial of a given integer input. +The factorial of 0! is 1 by convention. +Negative inputs will raise an error. +* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### bitwise_not + + +Implementations: +bitwise_not(`x`): -> `return_type` +0. bitwise_not(`i8`): -> `i8` +1. bitwise_not(`i16`): -> `i16` +2. bitwise_not(`i32`): -> `i32` +3. bitwise_not(`i64`): -> `i64` + +*Return the bitwise NOT result for one integer input. +* +### bitwise_and + + +Implementations: +bitwise_and(`x`, `y`): -> `return_type` +0. bitwise_and(`i8`, `i8`): -> `i8` +1. bitwise_and(`i16`, `i16`): -> `i16` +2. bitwise_and(`i32`, `i32`): -> `i32` +3. bitwise_and(`i64`, `i64`): -> `i64` + +*Return the bitwise AND result for two integer inputs. +* +### bitwise_or + + +Implementations: +bitwise_or(`x`, `y`): -> `return_type` +0. bitwise_or(`i8`, `i8`): -> `i8` +1. bitwise_or(`i16`, `i16`): -> `i16` +2. bitwise_or(`i32`, `i32`): -> `i32` +3. bitwise_or(`i64`, `i64`): -> `i64` + +*Return the bitwise OR result for two given integer inputs. +* +### bitwise_xor + + +Implementations: +bitwise_xor(`x`, `y`): -> `return_type` +0. bitwise_xor(`i8`, `i8`): -> `i8` +1. bitwise_xor(`i16`, `i16`): -> `i16` +2. bitwise_xor(`i32`, `i32`): -> `i32` +3. bitwise_xor(`i64`, `i64`): -> `i64` + +*Return the bitwise XOR result for two integer inputs. +* +## Aggregate Functions + +### sum + + +Implementations: +sum(`x`, `option:overflow`): -> `return_type` +0. sum(`i8`, `option:overflow`): -> `i64?` +1. sum(`i16`, `option:overflow`): -> `i64?` +2. sum(`i32`, `option:overflow`): -> `i64?` +3. sum(`i64`, `option:overflow`): -> `i64?` +4. sum(`fp32`, `option:overflow`): -> `fp64?` +5. sum(`fp64`, `option:overflow`): -> `fp64?` + +*Sum a set of values. The sum of zero elements yields null.* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### sum0 + + +Implementations: +sum0(`x`, `option:overflow`): -> `return_type` +0. sum0(`i8`, `option:overflow`): -> `i64` +1. sum0(`i16`, `option:overflow`): -> `i64` +2. sum0(`i32`, `option:overflow`): -> `i64` +3. sum0(`i64`, `option:overflow`): -> `i64` +4. sum0(`fp32`, `option:overflow`): -> `fp64` +5. sum0(`fp64`, `option:overflow`): -> `fp64` + +*Sum a set of values. The sum of zero elements yields zero. +Null values are ignored. +* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### avg + + +Implementations: +avg(`x`, `option:overflow`): -> `return_type` +0. avg(`i8`, `option:overflow`): -> `i8?` +1. avg(`i16`, `option:overflow`): -> `i16?` +2. avg(`i32`, `option:overflow`): -> `i32?` +3. avg(`i64`, `option:overflow`): -> `i64?` +4. avg(`fp32`, `option:overflow`): -> `fp32?` +5. avg(`fp64`, `option:overflow`): -> `fp64?` + +*Average a set of values. For integral types, this truncates partial values.* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### min + + +Implementations: +min(`x`): -> `return_type` +0. min(`i8`): -> `i8?` +1. min(`i16`): -> `i16?` +2. min(`i32`): -> `i32?` +3. min(`i64`): -> `i64?` +4. min(`fp32`): -> `fp32?` +5. min(`fp64`): -> `fp64?` + +*Min a set of values.* +### max + + +Implementations: +max(`x`): -> `return_type` +0. max(`i8`): -> `i8?` +1. max(`i16`): -> `i16?` +2. max(`i32`): -> `i32?` +3. max(`i64`): -> `i64?` +4. max(`fp32`): -> `fp32?` +5. max(`fp64`): -> `fp64?` + +*Max a set of values.* +### product + + +Implementations: +product(`x`, `option:overflow`): -> `return_type` +0. product(`i8`, `option:overflow`): -> `i8` +1. product(`i16`, `option:overflow`): -> `i16` +2. product(`i32`, `option:overflow`): -> `i32` +3. product(`i64`, `option:overflow`): -> `i64` +4. product(`fp32`, `option:rounding`): -> `fp32` +5. product(`fp64`, `option:rounding`): -> `fp64` + +*Product of a set of values. Returns 1 for empty input.* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### std_dev + + +Implementations: +std_dev(`x`, `option:rounding`, `option:distribution`): -> `return_type` +0. std_dev(`fp32`, `option:rounding`, `option:distribution`): -> `fp32?` +1. std_dev(`fp64`, `option:rounding`, `option:distribution`): -> `fp64?` + +*Calculates standard-deviation for a set of values.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • distribution ['SAMPLE', 'POPULATION']
  • + +
    + +### variance + + +Implementations: +variance(`x`, `option:rounding`, `option:distribution`): -> `return_type` +0. variance(`fp32`, `option:rounding`, `option:distribution`): -> `fp32?` +1. variance(`fp64`, `option:rounding`, `option:distribution`): -> `fp64?` + +*Calculates variance for a set of values.* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • distribution ['SAMPLE', 'POPULATION']
  • + +
    + +### corr + + +Implementations: +corr(`x`, `y`, `option:rounding`): -> `return_type` +0. corr(`fp32`, `fp32`, `option:rounding`): -> `fp32?` +1. corr(`fp64`, `fp64`, `option:rounding`): -> `fp64?` + +*Calculates the value of Pearson's correlation coefficient between `x` and `y`. If there is no input, null is returned. +* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### mode + + +Implementations: +mode(`x`): -> `return_type` +0. mode(`i8`): -> `i8?` +1. mode(`i16`): -> `i16?` +2. mode(`i32`): -> `i32?` +3. mode(`i64`): -> `i64?` +4. mode(`fp32`): -> `fp32?` +5. mode(`fp64`): -> `fp64?` + +*Calculates mode for a set of values. If there is no input, null is returned. +* +### median + + +Implementations: +median(`precision`, `x`, `option:rounding`): -> `return_type` +0. median(`precision`, `i8`, `option:rounding`): -> `i8?` +1. median(`precision`, `i16`, `option:rounding`): -> `i16?` +2. median(`precision`, `i32`, `option:rounding`): -> `i32?` +3. median(`precision`, `i64`, `option:rounding`): -> `i64?` +4. median(`precision`, `fp32`, `option:rounding`): -> `fp32?` +5. median(`precision`, `fp64`, `option:rounding`): -> `fp64?` + +*Calculate the median for a set of values. +Returns null if applied to zero records. For the integer implementations, the rounding option determines how the median should be rounded if it ends up midway between two values. For the floating point implementations, they specify the usual floating point rounding mode. +* + +
    Options: + +
  • precision ['EXACT', 'APPROXIMATE']
  • +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +### quantile + + +Implementations: +quantile(`boundaries`, `precision`, `n`, `distribution`, `option:rounding`): -> `return_type` +
  • n: A positive integer which defines the number of quantile partitions. +
  • +
  • distribution: The data for which the quantiles should be computed. +
  • +0. quantile(`boundaries`, `precision`, `i64`, `any`, `option:rounding`): -> `LIST?` + +*Calculates quantiles for a set of values. +This function will divide the aggregated values (passed via the distribution argument) over N equally-sized bins, where N is passed via a constant argument. It will then return the values at the boundaries of these bins in list form. If the input is appropriately sorted, this computes the quantiles of the distribution. +The function can optionally return the first and/or last element of the input, as specified by the `boundaries` argument. If the input is appropriately sorted, this will thus be the minimum and/or maximum values of the distribution. +When the boundaries do not lie exactly on elements of the incoming distribution, the function will interpolate between the two nearby elements. If the interpolated value cannot be represented exactly, the `rounding` option controls how the value should be selected or computed. +The function fails and returns null in the following cases: + - `n` is null or less than one; + - any value in `distribution` is null. + +The function returns an empty list if `n` equals 1 and `boundaries` is set to `NEITHER`. +* + +
    Options: + +
  • boundaries ['NEITHER', 'MINIMUM', 'MAXIMUM', 'BOTH']
  • +
  • precision ['EXACT', 'APPROXIMATE']
  • +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • + +
    + +## Window Functions + +### row_number + + +Implementations: +0. row_number(): -> `i64?` + +*the number of the current row within its partition, starting at 1* +### rank + + +Implementations: +0. rank(): -> `i64?` + +*the rank of the current row, with gaps.* +### dense_rank + + +Implementations: +0. dense_rank(): -> `i64?` + +*the rank of the current row, without gaps.* +### percent_rank + + +Implementations: +0. percent_rank(): -> `fp64?` + +*the relative rank of the current row.* +### cume_dist + + +Implementations: +0. cume_dist(): -> `fp64?` + +*the cumulative distribution.* +### ntile + + +Implementations: +ntile(`x`): -> `return_type` +0. ntile(`i32`): -> `i32?` +1. ntile(`i64`): -> `i64?` + +*Return an integer ranging from 1 to the argument value,dividing the partition as equally as possible.* +### first_value + + +Implementations: +first_value(`expression`): -> `return_type` +0. first_value(`any1`): -> `any1` + +*Returns the first value in the window. +* +### last_value + + +Implementations: +last_value(`expression`): -> `return_type` +0. last_value(`any1`): -> `any1` + +*Returns the last value in the window. +* +### nth_value + + +Implementations: +nth_value(`expression`, `window_offset`, `option:on_domain_error`): -> `return_type` +0. nth_value(`any1`, `i32`, `option:on_domain_error`): -> `any1?` + +*Returns a value from the nth row based on the `window_offset`. `window_offset` should be a positive integer. If the value of the `window_offset` is outside the range of the window, `null` is returned. +The `on_domain_error` option governs behavior in cases where `window_offset` is not a positive integer or `null`. +* + +
    Options: + +
  • on_domain_error ['NAN', 'ERROR']
  • + +
    + +### lead + + +Implementations: +lead(`expression`): -> `return_type` +0. lead(`any1`): -> `any1?` +1. lead(`any1`, `i32`): -> `any1?` +2. lead(`any1`, `i32`, `any1`): -> `any1?` + +*Return a value from a following row based on a specified physical offset. This allows you to compare a value in the current row against a following row. +The `expression` is evaluated against a row that comes after the current row based on the `row_offset`. The `row_offset` should be a positive integer and is set to 1 if not specified explicitly. If the `row_offset` is negative, the expression will be evaluated against a row coming before the current row, similar to the `lag` function. A `row_offset` of `null` will return `null`. The function returns the `default` input value if `row_offset` goes beyond the scope of the window. If a `default` value is not specified, it is set to `null`. +Example comparing the sales of the current year to the following year. `row_offset` of 1. | year | sales | next_year_sales | | 2019 | 20.50 | 30.00 | | 2020 | 30.00 | 45.99 | | 2021 | 45.99 | null | +* +### lag + + +Implementations: +lag(`expression`): -> `return_type` +0. lag(`any1`): -> `any1?` +1. lag(`any1`, `i32`): -> `any1?` +2. lag(`any1`, `i32`, `any1`): -> `any1?` + +*Return a column value from a previous row based on a specified physical offset. This allows you to compare a value in the current row against a previous row. +The `expression` is evaluated against a row that comes before the current row based on the `row_offset`. The `expression` can be a column, expression or subquery that evaluates to a single value. The `row_offset` should be a positive integer and is set to 1 if not specified explicitly. If the `row_offset` is negative, the expression will be evaluated against a row coming after the current row, similar to the `lead` function. A `row_offset` of `null` will return `null`. The function returns the `default` input value if `row_offset` goes beyond the scope of the partition. If a `default` value is not specified, it is set to `null`. +Example comparing the sales of the current year to the previous year. `row_offset` of 1. | year | sales | previous_year_sales | | 2019 | 20.50 | null | | 2020 | 30.00 | 20.50 | | 2021 | 45.99 | 30.00 | +* \ No newline at end of file diff --git a/site/docs/extensions/functions_arithmetic_decimal.md b/site/docs/extensions/functions_arithmetic_decimal.md new file mode 100644 index 000000000..c5c2e4953 --- /dev/null +++ b/site/docs/extensions/functions_arithmetic_decimal.md @@ -0,0 +1,271 @@ + + + + +# functions_arithmetic_decimal.yaml + + +This document file is generated for [functions_arithmetic_decimal.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_arithmetic_decimal.yaml) +## Scalar Functions + +### add + + +Implementations: +add(`x`, `y`, `option:overflow`): -> `return_type` +0. add(`decimal`, `decimal`, `option:overflow`): -> + ``` + init_scale = max(S1,S2) + init_prec = init_scale + max(P1 - S1, P2 - S2) + 1 + min_scale = min(init_scale, 6) + delta = init_prec - 38 + prec = min(init_prec, 38) + scale_after_borrow = max(init_scale - delta, min_scale) + scale = init_prec > 38 ? scale_after_borrow : init_scale + DECIMAL + ``` + +*Add two decimal values.* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### subtract + + +Implementations: +subtract(`x`, `y`, `option:overflow`): -> `return_type` +0. subtract(`decimal`, `decimal`, `option:overflow`): -> + ``` + init_scale = max(S1,S2) + init_prec = init_scale + max(P1 - S1, P2 - S2) + 1 + min_scale = min(init_scale, 6) + delta = init_prec - 38 + prec = min(init_prec, 38) + scale_after_borrow = max(init_scale - delta, min_scale) + scale = init_prec > 38 ? scale_after_borrow : init_scale + DECIMAL + ``` + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### multiply + + +Implementations: +multiply(`x`, `y`, `option:overflow`): -> `return_type` +0. multiply(`decimal`, `decimal`, `option:overflow`): -> + ``` + init_scale = S1 + S2 + init_prec = P1 + P2 + 1 + min_scale = min(init_scale, 6) + delta = init_prec - 38 + prec = min(init_prec, 38) + scale_after_borrow = max(init_scale - delta, min_scale) + scale = init_prec > 38 ? scale_after_borrow : init_scale + DECIMAL + ``` + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### divide + + +Implementations: +divide(`x`, `y`, `option:overflow`): -> `return_type` +0. divide(`decimal`, `decimal`, `option:overflow`): -> + ``` + init_scale = max(6, S1 + P2 + 1) + init_prec = P1 - S1 + P2 + init_scale + min_scale = min(init_scale, 6) + delta = init_prec - 38 + prec = min(init_prec, 38) + scale_after_borrow = max(init_scale - delta, min_scale) + scale = init_prec > 38 ? scale_after_borrow : init_scale + DECIMAL + ``` + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### modulus + + +Implementations: +modulus(`x`, `y`, `option:overflow`): -> `return_type` +0. modulus(`decimal`, `decimal`, `option:overflow`): -> + ``` + init_scale = max(S1,S2) + init_prec = min(P1 - S1, P2 - S2) + init_scale + min_scale = min(init_scale, 6) + delta = init_prec - 38 + prec = min(init_prec, 38) + scale_after_borrow = max(init_scale - delta, min_scale) + scale = init_prec > 38 ? scale_after_borrow : init_scale + DECIMAL + ``` + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### abs + + +Implementations: +abs(`x`): -> `return_type` +0. abs(`decimal`): -> `decimal` + +*Calculate the absolute value of the argument.* +### bitwise_and + + +Implementations: +bitwise_and(`x`, `y`): -> `return_type` +0. bitwise_and(`DECIMAL`, `DECIMAL`): -> + ``` + max_precision = max(P1, P2) + DECIMAL + ``` + +*Return the bitwise AND result for two decimal inputs. In inputs scale must be 0 (i.e. only integer types are allowed) +* +### bitwise_or + + +Implementations: +bitwise_or(`x`, `y`): -> `return_type` +0. bitwise_or(`DECIMAL`, `DECIMAL`): -> + ``` + max_precision = max(P1, P2) + DECIMAL + ``` + +*Return the bitwise OR result for two given decimal inputs. In inputs scale must be 0 (i.e. only integer types are allowed) +* +### bitwise_xor + + +Implementations: +bitwise_xor(`x`, `y`): -> `return_type` +0. bitwise_xor(`DECIMAL`, `DECIMAL`): -> + ``` + max_precision = max(P1, P2) + DECIMAL + ``` + +*Return the bitwise XOR result for two given decimal inputs. In inputs scale must be 0 (i.e. only integer types are allowed) +* +### sqrt + + +Implementations: +sqrt(`x`): -> `return_type` +0. sqrt(`DECIMAL`): -> `fp64` + +*Square root of the value. Sqrt of 0 is 0 and sqrt of negative values will raise an error.* +### factorial + + +Implementations: +factorial(`n`): -> `return_type` +0. factorial(`DECIMAL`): -> `DECIMAL<38,0>` + +*Return the factorial of a given decimal input. Scale should be 0 for factorial decimal input. The factorial of 0! is 1 by convention. Negative inputs will raise an error. Input which cause overflow of result will raise an error. +* +### power + + +Implementations: +power(`x`, `y`, `option:overflow`, `option:complex_number_result`): -> `return_type` +0. power(`DECIMAL`, `DECIMAL`, `option:overflow`, `option:complex_number_result`): -> `fp64` + +*Take the power with x as the base and y as exponent. Behavior for complex number result is indicated by option complex_number_result* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • +
  • complex_number_result ['NAN', 'ERROR']
  • + +
    + +## Aggregate Functions + +### sum + + +Implementations: +sum(`x`, `option:overflow`): -> `return_type` +0. sum(`DECIMAL`, `option:overflow`): -> `DECIMAL?<38,S>` + +*Sum a set of values.* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### avg + + +Implementations: +avg(`x`, `option:overflow`): -> `return_type` +0. avg(`DECIMAL`, `option:overflow`): -> `DECIMAL<38,S>` + +*Average a set of values.* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    + +### min + + +Implementations: +min(`x`): -> `return_type` +0. min(`DECIMAL`): -> `DECIMAL?` + +*Min a set of values.* +### max + + +Implementations: +max(`x`): -> `return_type` +0. max(`DECIMAL`): -> `DECIMAL?` + +*Max a set of values.* +### sum0 + + +Implementations: +sum0(`x`, `option:overflow`): -> `return_type` +0. sum0(`DECIMAL`, `option:overflow`): -> `DECIMAL<38,S>` + +*Sum a set of values. The sum of zero elements yields zero. +Null values are ignored. +* + +
    Options: + +
  • overflow ['SILENT', 'SATURATE', 'ERROR']
  • + +
    diff --git a/site/docs/extensions/functions_boolean.md b/site/docs/extensions/functions_boolean.md new file mode 100644 index 000000000..3d84dd359 --- /dev/null +++ b/site/docs/extensions/functions_boolean.md @@ -0,0 +1,122 @@ + + + + +# functions_boolean.yaml + + +This document file is generated for [functions_boolean.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_boolean.yaml) +## Scalar Functions + +### or + + +Implementations: +or(`a`): -> `return_type` +0. or(`boolean?`): -> `boolean?` + +*The boolean `or` using Kleene logic. +This function behaves as follows with nulls: + + true or null = true + + null or true = true + + false or null = null + + null or false = null + + null or null = null + +In other words, in this context a null value really means "unknown", and an unknown value `or` true is always true. +Behavior for 0 or 1 inputs is as follows: + or() -> false + or(x) -> x +* +### and + + +Implementations: +and(`a`): -> `return_type` +0. and(`boolean?`): -> `boolean?` + +*The boolean `and` using Kleene logic. +This function behaves as follows with nulls: + + true and null = null + + null and true = null + + false and null = false + + null and false = false + + null and null = null + +In other words, in this context a null value really means "unknown", and an unknown value `and` false is always false. +Behavior for 0 or 1 inputs is as follows: + and() -> true + and(x) -> x +* +### and_not + + +Implementations: +and_not(`a`, `b`): -> `return_type` +0. and_not(`boolean?`, `boolean?`): -> `boolean?` + +*The boolean `and` of one value and the negation of the other using Kleene logic. +This function behaves as follows with nulls: + + true and not null = null + + null and not false = null + + false and not null = false + + null and not true = false + + null and not null = null + +In other words, in this context a null value really means "unknown", and an unknown value `and not` true is always false, as is false `and not` an unknown value. +* +### xor + + +Implementations: +xor(`a`, `b`): -> `return_type` +0. xor(`boolean?`, `boolean?`): -> `boolean?` + +*The boolean `xor` of two values using Kleene logic. +When a null is encountered in either input, a null is output. +* +### not + + +Implementations: +not(`a`): -> `return_type` +0. not(`boolean?`): -> `boolean?` + +*The `not` of a boolean value. +When a null is input, a null is output. +* +## Aggregate Functions + +### bool_and + + +Implementations: +bool_and(`a`): -> `return_type` +0. bool_and(`boolean`): -> `boolean?` + +*If any value in the input is false, false is returned. If the input is empty or only contains nulls, null is returned. Otherwise, true is returned. +* +### bool_or + + +Implementations: +bool_or(`a`): -> `return_type` +0. bool_or(`boolean`): -> `boolean?` + +*If any value in the input is true, true is returned. If the input is empty or only contains nulls, null is returned. Otherwise, false is returned. +* \ No newline at end of file diff --git a/site/docs/extensions/functions_comparison.md b/site/docs/extensions/functions_comparison.md new file mode 100644 index 000000000..343033d7e --- /dev/null +++ b/site/docs/extensions/functions_comparison.md @@ -0,0 +1,206 @@ + + + + +# functions_comparison.yaml + + +This document file is generated for [functions_comparison.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_comparison.yaml) +## Scalar Functions + +### not_equal + + +Implementations: +not_equal(`x`, `y`): -> `return_type` +0. not_equal(`any1`, `any1`): -> `boolean` + +*Whether two values are not_equal. +`not_equal(x, y) := (x != y)` +If either/both of `x` and `y` are `null`, `null` is returned. +* +### equal + + +Implementations: +equal(`x`, `y`): -> `return_type` +0. equal(`any1`, `any1`): -> `boolean` + +*Whether two values are equal. +`equal(x, y) := (x == y)` +If either/both of `x` and `y` are `null`, `null` is returned. +* +### is_not_distinct_from + + +Implementations: +is_not_distinct_from(`x`, `y`): -> `return_type` +0. is_not_distinct_from(`any1`, `any1`): -> `boolean` + +*Whether two values are equal. +This function treats `null` values as comparable, so +`is_not_distinct_from(null, null) == True` +This is in contrast to `equal`, in which `null` values do not compare. +* +### is_distinct_from + + +Implementations: +is_distinct_from(`x`, `y`): -> `return_type` +0. is_distinct_from(`any1`, `any1`): -> `boolean` + +*Whether two values are not equal. +This function treats `null` values as comparable, so +`is_distinct_from(null, null) == False` +This is in contrast to `equal`, in which `null` values do not compare. +* +### lt + + +Implementations: +lt(`x`, `y`): -> `return_type` +0. lt(`any1`, `any1`): -> `boolean` + +*Less than. +lt(x, y) := (x < y) +If either/both of `x` and `y` are `null`, `null` is returned. +* +### gt + + +Implementations: +gt(`x`, `y`): -> `return_type` +0. gt(`any1`, `any1`): -> `boolean` + +*Greater than. +gt(x, y) := (x > y) +If either/both of `x` and `y` are `null`, `null` is returned. +* +### lte + + +Implementations: +lte(`x`, `y`): -> `return_type` +0. lte(`any1`, `any1`): -> `boolean` + +*Less than or equal to. +lte(x, y) := (x <= y) +If either/both of `x` and `y` are `null`, `null` is returned. +* +### gte + + +Implementations: +gte(`x`, `y`): -> `return_type` +0. gte(`any1`, `any1`): -> `boolean` + +*Greater than or equal to. +gte(x, y) := (x >= y) +If either/both of `x` and `y` are `null`, `null` is returned. +* +### between + + +Implementations: +between(`expression`, `low`, `high`): -> `return_type` +
  • expression: The expression to test for in the range defined by `low` and `high`.
  • +
  • low: The value to check if greater than or equal to.
  • +
  • high: The value to check if less than or equal to.
  • +0. between(`any1`, `any1`, `any1`): -> `boolean` + +*Whether the `expression` is greater than or equal to `low` and less than or equal to `high`. +`expression` BETWEEN `low` AND `high` +If `low`, `high`, or `expression` are `null`, `null` is returned.* +### is_null + + +Implementations: +is_null(`x`): -> `return_type` +0. is_null(`any1`): -> `boolean` + +*Whether a value is null. NaN is not null.* +### is_not_null + + +Implementations: +is_not_null(`x`): -> `return_type` +0. is_not_null(`any1`): -> `boolean` + +*Whether a value is not null. NaN is not null.* +### is_nan + + +Implementations: +is_nan(`x`): -> `return_type` +0. is_nan(`fp32`): -> `boolean` +1. is_nan(`fp64`): -> `boolean` + +*Whether a value is not a number. +If `x` is `null`, `null` is returned. +* +### is_finite + + +Implementations: +is_finite(`x`): -> `return_type` +0. is_finite(`fp32`): -> `boolean` +1. is_finite(`fp64`): -> `boolean` + +*Whether a value is finite (neither infinite nor NaN). +If `x` is `null`, `null` is returned. +* +### is_infinite + + +Implementations: +is_infinite(`x`): -> `return_type` +0. is_infinite(`fp32`): -> `boolean` +1. is_infinite(`fp64`): -> `boolean` + +*Whether a value is infinite. +If `x` is `null`, `null` is returned. +* +### nullif + + +Implementations: +nullif(`x`, `y`): -> `return_type` +0. nullif(`any1`, `any1`): -> `any1` + +*If two values are equal, return null. Otherwise, return the first value.* +### coalesce + + +Implementations: +0. coalesce(`any1`, `any1`): -> `any1` + +*Evaluate arguments from left to right and return the first argument that is not null. Once a non-null argument is found, the remaining arguments are not evaluated. +If all arguments are null, return null.* +### least + + +Implementations: +0. least(`any1`, `any1`): -> `any1` + +*Evaluates each argument and returns the smallest one. The function will return null if any argument evaluates to null.* +### least_skip_null + + +Implementations: +0. least_skip_null(`any1`, `any1`): -> `any1` + +*Evaluates each argument and returns the smallest one. The function will return null only if all arguments evaluate to null.* +### greatest + + +Implementations: +0. greatest(`any1`, `any1`): -> `any1` + +*Evaluates each argument and returns the largest one. The function will return null if any argument evaluates to null.* +### greatest_skip_null + + +Implementations: +0. greatest_skip_null(`any1`, `any1`): -> `any1` + +*Evaluates each argument and returns the largest one. The function will return null only if all arguments evaluate to null.* \ No newline at end of file diff --git a/site/docs/extensions/functions_datetime.md b/site/docs/extensions/functions_datetime.md new file mode 100644 index 000000000..bc7b82b8c --- /dev/null +++ b/site/docs/extensions/functions_datetime.md @@ -0,0 +1,352 @@ + + + + +# functions_datetime.yaml + + +This document file is generated for [functions_datetime.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_datetime.yaml) +## Scalar Functions + +### extract + + +Implementations: +extract(`component`, `x`, `timezone`): -> `return_type` +
  • x: Timezone string from IANA tzdb.
  • +0. extract(`component`, `timestamp_tz`, `string`): -> `i64` +1. extract(`component`, `precision_timestamp_tz

    `, `string`): -> `i64` +2. extract(`component`, `timestamp`): -> `i64` +3. extract(`component`, `precision_timestamp

    `): -> `i64` +4. extract(`component`, `date`): -> `i64` +5. extract(`component`, `time`): -> `i64` +6. extract(`component`, `indexing`, `timestamp_tz`, `string`): -> `i64` +7. extract(`component`, `indexing`, `precision_timestamp_tz

    `, `string`): -> `i64` +8. extract(`component`, `indexing`, `timestamp`): -> `i64` +9. extract(`component`, `indexing`, `precision_timestamp

    `): -> `i64` +10. extract(`component`, `indexing`, `date`): -> `i64` + +*Extract portion of a date/time value. * YEAR Return the year. * ISO_YEAR Return the ISO 8601 week-numbering year. First week of an ISO year has the majority (4 or more) of + its days in January. +* US_YEAR Return the US epidemiological year. First week of US epidemiological year has the majority (4 or more) + of its days in January. Last week of US epidemiological year has the year's last Wednesday in it. US + epidemiological week starts on Sunday. +* QUARTER Return the number of the quarter within the year. January 1 through March 31 map to the first quarter, + April 1 through June 30 map to the second quarter, etc. +* MONTH Return the number of the month within the year. * DAY Return the number of the day within the month. * DAY_OF_YEAR Return the number of the day within the year. January 1 maps to the first day, February 1 maps to + the thirty-second day, etc. +* MONDAY_DAY_OF_WEEK Return the number of the day within the week, from Monday (first day) to Sunday (seventh + day). +* SUNDAY_DAY_OF_WEEK Return the number of the day within the week, from Sunday (first day) to Saturday (seventh + day). +* MONDAY_WEEK Return the number of the week within the year. First week starts on first Monday of January. * SUNDAY_WEEK Return the number of the week within the year. First week starts on first Sunday of January. * ISO_WEEK Return the number of the ISO week within the ISO year. First ISO week has the majority (4 or more) + of its days in January. ISO week starts on Monday. +* US_WEEK Return the number of the US week within the US year. First US week has the majority (4 or more) of + its days in January. US week starts on Sunday. +* HOUR Return the hour (0-23). * MINUTE Return the minute (0-59). * SECOND Return the second (0-59). * MILLISECOND Return number of milliseconds since the last full second. * MICROSECOND Return number of microseconds since the last full millisecond. * NANOSECOND Return number of nanoseconds since the last full microsecond. * SUBSECOND Return number of microseconds since the last full second of the given timestamp. * UNIX_TIME Return number of seconds that have elapsed since 1970-01-01 00:00:00 UTC, ignoring leap seconds. * TIMEZONE_OFFSET Return number of seconds of timezone offset to UTC. +The range of values returned for QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK depends on whether counting starts at 1 or 0. This is governed by the indexing option. +When indexing is ONE: * QUARTER returns values in range 1-4 * MONTH returns values in range 1-12 * DAY returns values in range 1-31 * DAY_OF_YEAR returns values in range 1-366 * MONDAY_DAY_OF_WEEK and SUNDAY_DAY_OF_WEEK return values in range 1-7 * MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 1-53 +When indexing is ZERO: * QUARTER returns values in range 0-3 * MONTH returns values in range 0-11 * DAY returns values in range 0-30 * DAY_OF_YEAR returns values in range 0-365 * MONDAY_DAY_OF_WEEK and SUNDAY_DAY_OF_WEEK return values in range 0-6 * MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, and US_WEEK return values in range 0-52 +The indexing option must be specified when the component is QUARTER, MONTH, DAY, DAY_OF_YEAR, MONDAY_DAY_OF_WEEK, SUNDAY_DAY_OF_WEEK, MONDAY_WEEK, SUNDAY_WEEK, ISO_WEEK, or US_WEEK. The indexing option cannot be specified when the component is YEAR, ISO_YEAR, US_YEAR, HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND, SUBSECOND, UNIX_TIME, or TIMEZONE_OFFSET. +Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: "Pacific/Marquesas", "Etc/GMT+1". If timezone is invalid an error is thrown.* + +

    Options: + +
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND', 'UNIX_TIME', 'TIMEZONE_OFFSET']
  • +
  • indexing ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'NANOSECOND', 'SUBSECOND', 'UNIX_TIME', 'TIMEZONE_OFFSET']
  • +
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND', 'UNIX_TIME']
  • +
  • indexing ['YEAR', 'ISO_YEAR', 'US_YEAR', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'NANOSECOND', 'SUBSECOND', 'UNIX_TIME']
  • +
  • component ['YEAR', 'ISO_YEAR', 'US_YEAR', 'UNIX_TIME']
  • +
  • indexing ['HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND', 'SUBSECOND']
  • +
  • component ['QUARTER', 'MONTH', 'DAY', 'DAY_OF_YEAR', 'MONDAY_DAY_OF_WEEK', 'SUNDAY_DAY_OF_WEEK', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK']
  • +
  • indexing ['ONE', 'ZERO']
  • + +
    + +### extract_boolean + + +Implementations: +extract_boolean(`component`, `x`): -> `return_type` +0. extract_boolean(`component`, `timestamp`): -> `boolean` +1. extract_boolean(`component`, `precision_timestamp

    `): -> `boolean` +2. extract_boolean(`component`, `timestamp_tz`, `string`): -> `boolean` +3. extract_boolean(`component`, `precision_timestamp_tz

    `, `string`): -> `boolean` +4. extract_boolean(`component`, `date`): -> `boolean` + +*Extract boolean values of a date/time value. * IS_LEAP_YEAR Return true if year of the given value is a leap year and false otherwise. * IS_DST Return true if DST (Daylight Savings Time) is observed at the given value + in the given timezone. + +Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: "Pacific/Marquesas", "Etc/GMT+1". If timezone is invalid an error is thrown.* + +

    Options: + +
  • component ['IS_LEAP_YEAR']
  • +
  • component ['IS_LEAP_YEAR', 'IS_DST']
  • + +
    + +### add + + +Implementations: +add(`x`, `y`): -> `return_type` +0. add(`timestamp`, `interval_year`): -> `timestamp` +1. add(`precision_timestamp

    `, `interval_year`): -> `precision_timestamp

    ` +2. add(`timestamp_tz`, `interval_year`, `string`): -> `timestamp_tz` +3. add(`precision_timestamp_tz

    `, `interval_year`, `string`): -> `precision_timestamp_tz

    ` +4. add(`date`, `interval_year`): -> `timestamp` +5. add(`timestamp`, `interval_day

    `): -> `timestamp` +6. add(`precision_timestamp

    `, `interval_day

    `): -> `precision_timestamp

    ` +7. add(`timestamp_tz`, `interval_day

    `): -> `timestamp_tz` +8. add(`precision_timestamp_tz

    `, `interval_day

    `): -> `precision_timestamp_tz

    ` +9. add(`date`, `interval_day

    `): -> `timestamp` + +*Add an interval to a date/time type. +Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: "Pacific/Marquesas", "Etc/GMT+1". If timezone is invalid an error is thrown.* +### multiply + + +Implementations: +multiply(`x`, `y`): -> `return_type` +0. multiply(`i8`, `interval_day

    `): -> `interval_day

    ` +1. multiply(`i16`, `interval_day

    `): -> `interval_day

    ` +2. multiply(`i32`, `interval_day

    `): -> `interval_day

    ` +3. multiply(`i64`, `interval_day

    `): -> `interval_day

    ` +4. multiply(`i8`, `interval_year`): -> `interval_year` +5. multiply(`i16`, `interval_year`): -> `interval_year` +6. multiply(`i32`, `interval_year`): -> `interval_year` +7. multiply(`i64`, `interval_year`): -> `interval_year` + +*Multiply an interval by an integral number.* +### add_intervals + + +Implementations: +add_intervals(`x`, `y`): -> `return_type` +0. add_intervals(`interval_day

    `, `interval_day

    `): -> `interval_day

    ` +1. add_intervals(`interval_year`, `interval_year`): -> `interval_year` + +*Add two intervals together.* +### subtract + + +Implementations: +subtract(`x`, `y`): -> `return_type` +0. subtract(`timestamp`, `interval_year`): -> `timestamp` +1. subtract(`precision_timestamp

    `, `interval_year`): -> `precision_timestamp

    ` +2. subtract(`timestamp_tz`, `interval_year`): -> `timestamp_tz` +3. subtract(`precision_timestamp_tz

    `, `interval_year`): -> `precision_timestamp_tz

    ` +4. subtract(`timestamp_tz`, `interval_year`, `string`): -> `timestamp_tz` +5. subtract(`precision_timestamp_tz

    `, `interval_year`, `string`): -> `precision_timestamp_tz

    ` +6. subtract(`date`, `interval_year`): -> `date` +7. subtract(`timestamp`, `interval_day

    `): -> `timestamp` +8. subtract(`precision_timestamp

    `, `interval_day

    `): -> `precision_timestamp

    ` +9. subtract(`timestamp_tz`, `interval_day

    `): -> `timestamp_tz` +10. subtract(`precision_timestamp_tz

    `, `interval_day

    `): -> `precision_timestamp_tz

    ` +11. subtract(`date`, `interval_day

    `): -> `date` + +*Subtract an interval from a date/time type. +Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: "Pacific/Marquesas", "Etc/GMT+1". If timezone is invalid an error is thrown.* +### lte + + +Implementations: +lte(`x`, `y`): -> `return_type` +0. lte(`timestamp`, `timestamp`): -> `boolean` +1. lte(`precision_timestamp

    `, `precision_timestamp

    `): -> `boolean` +2. lte(`timestamp_tz`, `timestamp_tz`): -> `boolean` +3. lte(`precision_timestamp_tz

    `, `precision_timestamp_tz

    `): -> `boolean` +4. lte(`date`, `date`): -> `boolean` +5. lte(`interval_day

    `, `interval_day

    `): -> `boolean` +6. lte(`interval_year`, `interval_year`): -> `boolean` + +*less than or equal to* +### lt + + +Implementations: +lt(`x`, `y`): -> `return_type` +0. lt(`timestamp`, `timestamp`): -> `boolean` +1. lt(`precision_timestamp

    `, `precision_timestamp

    `): -> `boolean` +2. lt(`timestamp_tz`, `timestamp_tz`): -> `boolean` +3. lt(`precision_timestamp_tz

    `, `precision_timestamp_tz

    `): -> `boolean` +4. lt(`date`, `date`): -> `boolean` +5. lt(`interval_day

    `, `interval_day

    `): -> `boolean` +6. lt(`interval_year`, `interval_year`): -> `boolean` + +*less than* +### gte + + +Implementations: +gte(`x`, `y`): -> `return_type` +0. gte(`timestamp`, `timestamp`): -> `boolean` +1. gte(`precision_timestamp

    `, `precision_timestamp

    `): -> `boolean` +2. gte(`timestamp_tz`, `timestamp_tz`): -> `boolean` +3. gte(`precision_timestamp_tz

    `, `precision_timestamp_tz

    `): -> `boolean` +4. gte(`date`, `date`): -> `boolean` +5. gte(`interval_day

    `, `interval_day

    `): -> `boolean` +6. gte(`interval_year`, `interval_year`): -> `boolean` + +*greater than or equal to* +### gt + + +Implementations: +gt(`x`, `y`): -> `return_type` +0. gt(`timestamp`, `timestamp`): -> `boolean` +1. gt(`precision_timestamp

    `, `precision_timestamp

    `): -> `boolean` +2. gt(`timestamp_tz`, `timestamp_tz`): -> `boolean` +3. gt(`precision_timestamp_tz

    `, `precision_timestamp_tz

    `): -> `boolean` +4. gt(`date`, `date`): -> `boolean` +5. gt(`interval_day

    `, `interval_day

    `): -> `boolean` +6. gt(`interval_year`, `interval_year`): -> `boolean` + +*greater than* +### assume_timezone + + +Implementations: +assume_timezone(`x`, `timezone`): -> `return_type` +

  • x: Timezone string from IANA tzdb.
  • +0. assume_timezone(`timestamp`, `string`): -> `timestamp_tz` +1. assume_timezone(`precision_timestamp

    `, `string`): -> `precision_timestamp_tz

    ` +2. assume_timezone(`date`, `string`): -> `timestamp_tz` + +*Convert local timestamp to UTC-relative timestamp_tz using given local time's timezone. +Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: "Pacific/Marquesas", "Etc/GMT+1". If timezone is invalid an error is thrown.* +### local_timestamp + + +Implementations: +local_timestamp(`x`, `timezone`): -> `return_type` +

  • x: Timezone string from IANA tzdb.
  • +0. local_timestamp(`timestamp_tz`, `string`): -> `timestamp` +1. local_timestamp(`precision_timestamp_tz

    `, `string`): -> `precision_timestamp

    ` + +*Convert UTC-relative timestamp_tz to local timestamp using given local time's timezone. +Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: "Pacific/Marquesas", "Etc/GMT+1". If timezone is invalid an error is thrown.* +### strptime_time + + +Implementations: +strptime_time(`time_string`, `format`): -> `return_type` +0. strptime_time(`string`, `string`): -> `time` + +*Parse string into time using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference.* +### strptime_date + + +Implementations: +strptime_date(`date_string`, `format`): -> `return_type` +0. strptime_date(`string`, `string`): -> `date` + +*Parse string into date using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference.* +### strptime_timestamp + + +Implementations: +strptime_timestamp(`timestamp_string`, `format`, `timezone`): -> `return_type` +

  • timestamp_string: Timezone string from IANA tzdb.
  • +0. strptime_timestamp(`string`, `string`, `string`): -> `timestamp_tz` +1. strptime_timestamp(`string`, `string`): -> `timestamp_tz` + +*Parse string into timestamp using provided format, see https://man7.org/linux/man-pages/man3/strptime.3.html for reference. If timezone is present in timestamp and provided as parameter an error is thrown. +Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: "Pacific/Marquesas", "Etc/GMT+1". If timezone is supplied as parameter and present in the parsed string the parsed timezone is used. If parameter supplied timezone is invalid an error is thrown.* +### strftime + + +Implementations: +strftime(`x`, `format`): -> `return_type` +0. strftime(`timestamp`, `string`): -> `string` +1. strftime(`precision_timestamp

    `, `string`): -> `string` +2. strftime(`timestamp_tz`, `string`, `string`): -> `string` +3. strftime(`precision_timestamp_tz

    `, `string`, `string`): -> `string` +4. strftime(`date`, `string`): -> `string` +5. strftime(`time`, `string`): -> `string` + +*Convert timestamp/date/time to string using provided format, see https://man7.org/linux/man-pages/man3/strftime.3.html for reference. +Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: "Pacific/Marquesas", "Etc/GMT+1". If timezone is invalid an error is thrown.* +### round_temporal + + +Implementations: +round_temporal(`x`, `rounding`, `unit`, `multiple`, `origin`): -> `return_type` +0. round_temporal(`timestamp`, `rounding`, `unit`, `i64`, `timestamp`): -> `timestamp` +1. round_temporal(`precision_timestamp

    `, `rounding`, `unit`, `i64`, `precision_timestamp

    `): -> `precision_timestamp

    ` +2. round_temporal(`timestamp_tz`, `rounding`, `unit`, `i64`, `string`, `timestamp_tz`): -> `timestamp_tz` +3. round_temporal(`precision_timestamp_tz

    `, `rounding`, `unit`, `i64`, `string`, `precision_timestamp_tz

    `): -> `precision_timestamp_tz

    ` +4. round_temporal(`date`, `rounding`, `unit`, `i64`, `date`): -> `date` +5. round_temporal(`time`, `rounding`, `unit`, `i64`, `time`): -> `time` + +*Round a given timestamp/date/time to a multiple of a time unit. If the given timestamp is not already an exact multiple from the origin in the given timezone, the resulting point is chosen as one of the two nearest multiples. Which of these is chosen is governed by rounding: FLOOR means to use the earlier one, CEIL means to use the later one, ROUND_TIE_DOWN means to choose the nearest and tie to the earlier one if equidistant, ROUND_TIE_UP means to choose the nearest and tie to the later one if equidistant. +Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: "Pacific/Marquesas", "Etc/GMT+1". If timezone is invalid an error is thrown.* + +

    Options: + +
  • rounding ['FLOOR', 'CEIL', 'ROUND_TIE_DOWN', 'ROUND_TIE_UP']
  • +
  • unit ['YEAR', 'MONTH', 'WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • +
  • rounding ['YEAR', 'MONTH', 'WEEK', 'DAY']
  • +
  • unit ['HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • + +
    + +### round_calendar + + +Implementations: +round_calendar(`x`, `rounding`, `unit`, `origin`, `multiple`): -> `return_type` +0. round_calendar(`timestamp`, `rounding`, `unit`, `origin`, `i64`): -> `timestamp` +1. round_calendar(`precision_timestamp

    `, `rounding`, `unit`, `origin`, `i64`): -> `precision_timestamp

    ` +2. round_calendar(`timestamp_tz`, `rounding`, `unit`, `origin`, `i64`, `string`): -> `timestamp_tz` +3. round_calendar(`precision_timestamp_tz

    `, `rounding`, `unit`, `origin`, `i64`, `string`): -> `precision_timestamp_tz

    ` +4. round_calendar(`date`, `rounding`, `unit`, `origin`, `i64`, `date`): -> `date` +5. round_calendar(`time`, `rounding`, `unit`, `origin`, `i64`, `time`): -> `time` + +*Round a given timestamp/date/time to a multiple of a time unit. If the given timestamp is not already an exact multiple from the last origin unit in the given timezone, the resulting point is chosen as one of the two nearest multiples. Which of these is chosen is governed by rounding: FLOOR means to use the earlier one, CEIL means to use the later one, ROUND_TIE_DOWN means to choose the nearest and tie to the earlier one if equidistant, ROUND_TIE_UP means to choose the nearest and tie to the later one if equidistant. +Timezone strings must be as defined by IANA timezone database (https://www.iana.org/time-zones). Examples: "Pacific/Marquesas", "Etc/GMT+1". If timezone is invalid an error is thrown.* + +

    Options: + +
  • rounding ['FLOOR', 'CEIL', 'ROUND_TIE_DOWN', 'ROUND_TIE_UP']
  • +
  • unit ['YEAR', 'MONTH', 'WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • +
  • origin ['YEAR', 'MONTH', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND']
  • +
  • rounding ['YEAR', 'MONTH', 'WEEK', 'DAY']
  • +
  • unit ['YEAR', 'MONTH', 'MONDAY_WEEK', 'SUNDAY_WEEK', 'ISO_WEEK', 'US_WEEK', 'DAY']
  • +
  • origin ['DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND', 'MICROSECOND']
  • +
  • rounding ['DAY', 'HOUR', 'MINUTE', 'SECOND', 'MILLISECOND']
  • + +
    + +## Aggregate Functions + +### min + + +Implementations: +min(`x`): -> `return_type` +0. min(`date`): -> `date?` +1. min(`time`): -> `time?` +2. min(`timestamp`): -> `timestamp?` +3. min(`precision_timestamp

    `): -> `precision_timestamp?

    ` +4. min(`timestamp_tz`): -> `timestamp_tz?` +5. min(`precision_timestamp_tz

    `): -> `precision_timestamp_tz?

    ` +6. min(`interval_day

    `): -> `interval_day?

    ` +7. min(`interval_year`): -> `interval_year?` + +*Min a set of values.* +### max + + +Implementations: +max(`x`): -> `return_type` +0. max(`date`): -> `date?` +1. max(`time`): -> `time?` +2. max(`timestamp`): -> `timestamp?` +3. max(`timestamp_tz`): -> `timestamp_tz?` +4. max(`precision_timestamp_tz

    `): -> `precision_timestamp_tz?

    ` +5. max(`interval_day

    `): -> `interval_day?

    ` +6. max(`interval_year`): -> `interval_year?` + +*Max a set of values.* \ No newline at end of file diff --git a/site/docs/extensions/functions_geometry.md b/site/docs/extensions/functions_geometry.md new file mode 100644 index 000000000..4ef73e4f6 --- /dev/null +++ b/site/docs/extensions/functions_geometry.md @@ -0,0 +1,195 @@ + + + + +# functions_geometry.yaml + + +This document file is generated for [functions_geometry.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_geometry.yaml) +## Data Types + +name: geometry +structure: BINARY +## Scalar Functions + +### point + + +Implementations: +point(`x`, `y`): -> `return_type` +0. point(`fp64`, `fp64`): -> `u!geometry` + +*Returns a 2D point with the given `x` and `y` coordinate values. +* +### make_line + + +Implementations: +make_line(`geom1`, `geom2`): -> `return_type` +0. make_line(`u!geometry`, `u!geometry`): -> `u!geometry` + +*Returns a linestring connecting the endpoint of geometry `geom1` to the begin point of geometry `geom2`. Repeated points at the beginning of input geometries are collapsed to a single point. +A linestring can be closed or simple. A closed linestring starts and ends on the same point. A simple linestring does not cross or touch itself. +* +### x_coordinate + + +Implementations: +x_coordinate(`point`): -> `return_type` +0. x_coordinate(`u!geometry`): -> `fp64` + +*Return the x coordinate of the point. Return null if not available. +* +### y_coordinate + + +Implementations: +y_coordinate(`point`): -> `return_type` +0. y_coordinate(`u!geometry`): -> `fp64` + +*Return the y coordinate of the point. Return null if not available. +* +### num_points + + +Implementations: +num_points(`geom`): -> `return_type` +0. num_points(`u!geometry`): -> `i64` + +*Return the number of points in the geometry. The geometry should be an linestring or circularstring. +* +### is_empty + + +Implementations: +is_empty(`geom`): -> `return_type` +0. is_empty(`u!geometry`): -> `boolean` + +*Return true is the geometry is an empty geometry. +* +### is_closed + + +Implementations: +is_closed(`geom`): -> `return_type` +0. is_closed(`u!geometry`): -> `boolean` + +*Return true if the geometry's start and end points are the same. +* +### is_simple + + +Implementations: +is_simple(`geom`): -> `return_type` +0. is_simple(`u!geometry`): -> `boolean` + +*Return true if the geometry does not self intersect. +* +### is_ring + + +Implementations: +is_ring(`geom`): -> `return_type` +0. is_ring(`u!geometry`): -> `boolean` + +*Return true if the geometry's start and end points are the same and it does not self intersect. +* +### geometry_type + + +Implementations: +geometry_type(`geom`): -> `return_type` +0. geometry_type(`u!geometry`): -> `string` + +*Return the type of geometry as a string. +* +### envelope + + +Implementations: +envelope(`geom`): -> `return_type` +0. envelope(`u!geometry`): -> `u!geometry` + +*Return the minimum bounding box for the input geometry as a geometry. +The returned geometry is defined by the corner points of the bounding box. If the input geometry is a point or a line, the returned geometry can also be a point or line. +* +### dimension + + +Implementations: +dimension(`geom`): -> `return_type` +0. dimension(`u!geometry`): -> `i8` + +*Return the dimension of the input geometry. If the input is a collection of geometries, return the largest dimension from the collection. Dimensionality is determined by the complexity of the input and not the coordinate system being used. +Type dimensions: POINT - 0 LINE - 1 POLYGON - 2 +* +### is_valid + + +Implementations: +is_valid(`geom`): -> `return_type` +0. is_valid(`u!geometry`): -> `boolean` + +*Return true if the input geometry is a valid 2D geometry. +For 3 dimensional and 4 dimensional geometries, the validity is still only tested in 2 dimensions. +* +### collection_extract + + +Implementations: +collection_extract(`geom_collection`): -> `return_type` +0. collection_extract(`u!geometry`): -> `u!geometry` +1. collection_extract(`u!geometry`, `i8`): -> `u!geometry` + +*Given the input geometry collection, return a homogenous multi-geometry. All geometries in the multi-geometry will have the same dimension. +If type is not specified, the multi-geometry will only contain geometries of the highest dimension. If type is specified, the multi-geometry will only contain geometries of that type. If there are no geometries of the specified type, an empty geometry is returned. Only points, linestrings, and polygons are supported. +Type numbers: POINT - 0 LINE - 1 POLYGON - 2 +* +### flip_coordinates + + +Implementations: +flip_coordinates(`geom_collection`): -> `return_type` +0. flip_coordinates(`u!geometry`): -> `u!geometry` + +*Return a version of the input geometry with the X and Y axis flipped. +This operation can be performed on geometries with more than 2 dimensions. However, only X and Y axis will be flipped. +* +### remove_repeated_points + + +Implementations: +remove_repeated_points(`geom`): -> `return_type` +0. remove_repeated_points(`u!geometry`): -> `u!geometry` +1. remove_repeated_points(`u!geometry`, `fp64`): -> `u!geometry` + +*Return a version of the input geometry with duplicate consecutive points removed. +If the `tolerance` argument is provided, consecutive points within the tolerance distance of one another are considered to be duplicates. +* +### buffer + + +Implementations: +buffer(`geom`, `buffer_radius`): -> `return_type` +0. buffer(`u!geometry`, `fp64`): -> `u!geometry` + +*Compute and return an expanded version of the input geometry. All the points of the returned geometry are at a distance of `buffer_radius` away from the points of the input geometry. If a negative `buffer_radius` is provided, the geometry will shrink instead of expand. A negative `buffer_radius` may shrink the geometry completely, in which case an empty geometry is returned. For input the geometries of points or lines, a negative `buffer_radius` will always return an emtpy geometry. +* +### centroid + + +Implementations: +centroid(`geom`): -> `return_type` +0. centroid(`u!geometry`): -> `u!geometry` + +*Return a point which is the geometric center of mass of the input geometry. +* +### minimum_bounding_circle + + +Implementations: +minimum_bounding_circle(`geom`): -> `return_type` +0. minimum_bounding_circle(`u!geometry`): -> `u!geometry` + +*Return the smallest circle polygon that contains the input geometry. +* \ No newline at end of file diff --git a/site/docs/extensions/functions_logarithmic.md b/site/docs/extensions/functions_logarithmic.md new file mode 100644 index 000000000..5a64c717d --- /dev/null +++ b/site/docs/extensions/functions_logarithmic.md @@ -0,0 +1,114 @@ + + + + +# functions_logarithmic.yaml + + +This document file is generated for [functions_logarithmic.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_logarithmic.yaml) +## Scalar Functions + +### ln + + +Implementations: +ln(`x`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `return_type` +0. ln(`i64`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` +1. ln(`fp32`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp32` +2. ln(`fp64`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` +3. ln(`decimal`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` + +*Natural logarithm of the value* + +

    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • +
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • + +
    + +### log10 + + +Implementations: +log10(`x`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `return_type` +0. log10(`i64`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` +1. log10(`fp32`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp32` +2. log10(`fp64`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` +3. log10(`decimal`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` + +*Logarithm to base 10 of the value* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • +
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • + +
    + +### log2 + + +Implementations: +log2(`x`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `return_type` +0. log2(`i64`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` +1. log2(`fp32`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp32` +2. log2(`fp64`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` +3. log2(`decimal`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` + +*Logarithm to base 2 of the value* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • +
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • + +
    + +### logb + + +Implementations: +logb(`x`, `base`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `return_type` +
  • x: The number `x` to compute the logarithm of
  • +
  • base: The logarithm base `b` to use
  • +0. logb(`i64`, `i64`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` +1. logb(`fp32`, `fp32`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp32` +2. logb(`fp64`, `fp64`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` +3. logb(`decimal`, `decimal`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` + +*Logarithm of the value with the given base +logb(x, b) => log_{b} (x) +* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • +
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • + +
    + +### log1p + + +Implementations: +log1p(`x`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `return_type` +0. log1p(`fp32`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp32` +1. log1p(`fp64`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` +2. log1p(`decimal`, `option:rounding`, `option:on_domain_error`, `option:on_log_zero`): -> `fp64` + +*Natural logarithm (base e) of 1 + x +log1p(x) => log(1+x) +* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR']
  • +
  • on_domain_error ['NAN', 'NULL', 'ERROR']
  • +
  • on_log_zero ['NAN', 'ERROR', 'MINUS_INFINITY']
  • + +
    diff --git a/site/docs/extensions/functions_rounding.md b/site/docs/extensions/functions_rounding.md new file mode 100644 index 000000000..c4fb56b9f --- /dev/null +++ b/site/docs/extensions/functions_rounding.md @@ -0,0 +1,56 @@ + + + + +# functions_rounding.yaml + + +This document file is generated for [functions_rounding.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_rounding.yaml) +## Scalar Functions + +### ceil + + +Implementations: +ceil(`x`): -> `return_type` +0. ceil(`fp32`): -> `fp32` +1. ceil(`fp64`): -> `fp64` + +*Rounding to the ceiling of the value `x`. +* +### floor + + +Implementations: +floor(`x`): -> `return_type` +0. floor(`fp32`): -> `fp32` +1. floor(`fp64`): -> `fp64` + +*Rounding to the floor of the value `x`. +* +### round + + +Implementations: +round(`x`, `s`, `option:rounding`): -> `return_type` +
  • x: Numerical expression to be rounded. +
  • +
  • s: Number of decimal places to be rounded to. +When `s` is a positive number, nothing will happen since `x` is an integer value. +When `s` is a negative number, the rounding is performed to the nearest multiple of `10^(-s)`. +
  • +0. round(`i8`, `i32`, `option:rounding`): -> `i8?` +1. round(`i16`, `i32`, `option:rounding`): -> `i16?` +2. round(`i32`, `i32`, `option:rounding`): -> `i32?` +3. round(`i64`, `i32`, `option:rounding`): -> `i64?` +4. round(`fp32`, `i32`, `option:rounding`): -> `fp32?` +5. round(`fp64`, `i32`, `option:rounding`): -> `fp64?` + +*Rounding the value `x` to `s` decimal places. +* + +
    Options: + +
  • rounding ['TIE_TO_EVEN', 'TIE_AWAY_FROM_ZERO', 'TRUNCATE', 'CEILING', 'FLOOR', 'AWAY_FROM_ZERO', 'TIE_DOWN', 'TIE_UP', 'TIE_TOWARDS_ZERO', 'TIE_TO_ODD']
  • + +
    diff --git a/site/docs/extensions/functions_set.md b/site/docs/extensions/functions_set.md new file mode 100644 index 000000000..9fa0665bf --- /dev/null +++ b/site/docs/extensions/functions_set.md @@ -0,0 +1,30 @@ + + + + +# functions_set.yaml + + +This document file is generated for [functions_set.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_set.yaml) +## Scalar Functions + +### index_in + + +Implementations: +index_in(`needle`, `haystack`, `option:nan_equality`): -> `return_type` +0. index_in(`any1`, `list`, `option:nan_equality`): -> `i64?` + +*Checks the membership of a value in a list of values +Returns the first 0-based index value of some input `needle` if `needle` is equal to any element in `haystack`. Returns `NULL` if not found. +If `needle` is `NULL`, returns `NULL`. +If `needle` is `NaN`: + - Returns 0-based index of `NaN` in `input` (default) + - Returns `NULL` (if `NAN_IS_NOT_NAN` is specified) +* + +
    Options: + +
  • nan_equality ['NAN_IS_NAN', 'NAN_IS_NOT_NAN']
  • + +
    diff --git a/site/docs/extensions/functions_string.md b/site/docs/extensions/functions_string.md new file mode 100644 index 000000000..b4c7f8f3f --- /dev/null +++ b/site/docs/extensions/functions_string.md @@ -0,0 +1,686 @@ + + + + +# functions_string.yaml + + +This document file is generated for [functions_string.yaml](https://github.com/substrait-io/substrait/tree/main/extensions/functions_string.yaml) +## Scalar Functions + +### concat + + +Implementations: +concat(`input`, `option:null_handling`): -> `return_type` +0. concat(`varchar`, `option:null_handling`): -> `varchar` +1. concat(`string`, `option:null_handling`): -> `string` + +*Concatenate strings. +The `null_handling` option determines whether or not null values will be recognized by the function. If `null_handling` is set to `IGNORE_NULLS`, null value arguments will be ignored when strings are concatenated. If set to `ACCEPT_NULLS`, the result will be null if any argument passed to the concat function is null.* + +
    Options: + +
  • null_handling ['IGNORE_NULLS', 'ACCEPT_NULLS']
  • + +
    + +### like + + +Implementations: +like(`input`, `match`, `option:case_sensitivity`): -> `return_type` +
  • input: The input string.
  • +
  • match: The string to match against the input string.
  • +0. like(`varchar`, `varchar`, `option:case_sensitivity`): -> `boolean` +1. like(`string`, `string`, `option:case_sensitivity`): -> `boolean` + +*Are two strings like each other. +The `case_sensitivity` option applies to the `match` argument.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • + +
    + +### substring + + +Implementations: +substring(`input`, `start`, `length`, `option:negative_start`): -> `return_type` +0. substring(`varchar`, `i32`, `i32`, `option:negative_start`): -> `varchar` +1. substring(`string`, `i32`, `i32`, `option:negative_start`): -> `string` +2. substring(`fixedchar`, `i32`, `i32`, `option:negative_start`): -> `string` +3. substring(`varchar`, `i32`, `option:negative_start`): -> `varchar` +4. substring(`string`, `i32`, `option:negative_start`): -> `string` +5. substring(`fixedchar`, `i32`, `option:negative_start`): -> `string` + +*Extract a substring of a specified `length` starting from position `start`. A `start` value of 1 refers to the first characters of the string. When `length` is not specified the function will extract a substring starting from position `start` and ending at the end of the string. +The `negative_start` option applies to the `start` parameter. `WRAP_FROM_END` means the index will start from the end of the `input` and move backwards. The last character has an index of -1, the second to last character has an index of -2, and so on. `LEFT_OF_BEGINNING` means the returned substring will start from the left of the first character. A `start` of -1 will begin 2 characters left of the the `input`, while a `start` of 0 begins 1 character left of the `input`.* + +
    Options: + +
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING', 'ERROR']
  • +
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING']
  • + +
    + +### regexp_match_substring + + +Implementations: +regexp_match_substring(`input`, `pattern`, `position`, `occurrence`, `group`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `return_type` +0. regexp_match_substring(`varchar`, `varchar`, `i64`, `i64`, `i64`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `varchar` +1. regexp_match_substring(`string`, `string`, `i64`, `i64`, `i64`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `string` + +*Extract a substring that matches the given regular expression pattern. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be extracted is specified using the `occurrence` argument. Specifying `1` means the first occurrence will be extracted, `2` means the second occurrence, and so on. The `occurrence` argument should be a positive non-zero integer. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the `position` argument. Specifying `1` means to search for matches starting at the first character of the input string, `2` means the second character, and so on. The `position` argument should be a positive non-zero integer. The regular expression capture group can be specified using the `group` argument. Specifying `0` will return the substring matching the full regular expression. Specifying `1` will return the substring matching only the first capture group, and so on. The `group` argument should be a non-negative integer. +The `case_sensitivity` option specifies case-sensitive or case-insensitive matching. Enabling the `multiline` option will treat the input string as multiple lines. This makes the `^` and `$` characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the `dotall` option makes the `.` character match line terminator characters in a string. +Behavior is undefined if the regex fails to compile, the occurrence value is out of range, the position value is out of range, or the group value is out of range.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • +
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • +
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • + +
    + +### regexp_match_substring + + +Implementations: +regexp_match_substring(`input`, `pattern`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `return_type` +0. regexp_match_substring(`string`, `string`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `string` + +*Extract a substring that matches the given regular expression pattern. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The first occurrence of the pattern from the beginning of the string is extracted. It returns the substring matching the full regular expression. +The `case_sensitivity` option specifies case-sensitive or case-insensitive matching. Enabling the `multiline` option will treat the input string as multiple lines. This makes the `^` and `$` characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the `dotall` option makes the `.` character match line terminator characters in a string. +Behavior is undefined if the regex fails to compile.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • +
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • +
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • + +
    + +### regexp_match_substring_all + + +Implementations: +regexp_match_substring_all(`input`, `pattern`, `position`, `group`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `return_type` +0. regexp_match_substring_all(`varchar`, `varchar`, `i64`, `i64`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `List>` +1. regexp_match_substring_all(`string`, `string`, `i64`, `i64`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `List` + +*Extract all substrings that match the given regular expression pattern. This will return a list of extracted strings with one value for each occurrence of a match. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the `position` argument. Specifying `1` means to search for matches starting at the first character of the input string, `2` means the second character, and so on. The `position` argument should be a positive non-zero integer. The regular expression capture group can be specified using the `group` argument. Specifying `0` will return substrings matching the full regular expression. Specifying `1` will return substrings matching only the first capture group, and so on. The `group` argument should be a non-negative integer. +The `case_sensitivity` option specifies case-sensitive or case-insensitive matching. Enabling the `multiline` option will treat the input string as multiple lines. This makes the `^` and `$` characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the `dotall` option makes the `.` character match line terminator characters in a string. +Behavior is undefined if the regex fails to compile, the position value is out of range, or the group value is out of range.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • +
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • +
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • + +
    + +### starts_with + + +Implementations: +starts_with(`input`, `substring`, `option:case_sensitivity`): -> `return_type` +
  • input: The input string.
  • +
  • substring: The substring to search for.
  • +0. starts_with(`varchar`, `varchar`, `option:case_sensitivity`): -> `boolean` +1. starts_with(`varchar`, `string`, `option:case_sensitivity`): -> `boolean` +2. starts_with(`varchar`, `fixedchar`, `option:case_sensitivity`): -> `boolean` +3. starts_with(`string`, `string`, `option:case_sensitivity`): -> `boolean` +4. starts_with(`string`, `varchar`, `option:case_sensitivity`): -> `boolean` +5. starts_with(`string`, `fixedchar`, `option:case_sensitivity`): -> `boolean` +6. starts_with(`fixedchar`, `fixedchar`, `option:case_sensitivity`): -> `boolean` +7. starts_with(`fixedchar`, `string`, `option:case_sensitivity`): -> `boolean` +8. starts_with(`fixedchar`, `varchar`, `option:case_sensitivity`): -> `boolean` + +*Whether the `input` string starts with the `substring`. +The `case_sensitivity` option applies to the `substring` argument.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • + +
    + +### ends_with + + +Implementations: +ends_with(`input`, `substring`, `option:case_sensitivity`): -> `return_type` +
  • input: The input string.
  • +
  • substring: The substring to search for.
  • +0. ends_with(`varchar`, `varchar`, `option:case_sensitivity`): -> `boolean` +1. ends_with(`varchar`, `string`, `option:case_sensitivity`): -> `boolean` +2. ends_with(`varchar`, `fixedchar`, `option:case_sensitivity`): -> `boolean` +3. ends_with(`string`, `string`, `option:case_sensitivity`): -> `boolean` +4. ends_with(`string`, `varchar`, `option:case_sensitivity`): -> `boolean` +5. ends_with(`string`, `fixedchar`, `option:case_sensitivity`): -> `boolean` +6. ends_with(`fixedchar`, `fixedchar`, `option:case_sensitivity`): -> `boolean` +7. ends_with(`fixedchar`, `string`, `option:case_sensitivity`): -> `boolean` +8. ends_with(`fixedchar`, `varchar`, `option:case_sensitivity`): -> `boolean` + +*Whether `input` string ends with the substring. +The `case_sensitivity` option applies to the `substring` argument.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • + +
    + +### contains + + +Implementations: +contains(`input`, `substring`, `option:case_sensitivity`): -> `return_type` +
  • input: The input string.
  • +
  • substring: The substring to search for.
  • +0. contains(`varchar`, `varchar`, `option:case_sensitivity`): -> `boolean` +1. contains(`varchar`, `string`, `option:case_sensitivity`): -> `boolean` +2. contains(`varchar`, `fixedchar`, `option:case_sensitivity`): -> `boolean` +3. contains(`string`, `string`, `option:case_sensitivity`): -> `boolean` +4. contains(`string`, `varchar`, `option:case_sensitivity`): -> `boolean` +5. contains(`string`, `fixedchar`, `option:case_sensitivity`): -> `boolean` +6. contains(`fixedchar`, `fixedchar`, `option:case_sensitivity`): -> `boolean` +7. contains(`fixedchar`, `string`, `option:case_sensitivity`): -> `boolean` +8. contains(`fixedchar`, `varchar`, `option:case_sensitivity`): -> `boolean` + +*Whether the `input` string contains the `substring`. +The `case_sensitivity` option applies to the `substring` argument.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • + +
    + +### strpos + + +Implementations: +strpos(`input`, `substring`, `option:case_sensitivity`): -> `return_type` +
  • input: The input string.
  • +
  • substring: The substring to search for.
  • +0. strpos(`string`, `string`, `option:case_sensitivity`): -> `i64` +1. strpos(`varchar`, `varchar`, `option:case_sensitivity`): -> `i64` +2. strpos(`fixedchar`, `fixedchar`, `option:case_sensitivity`): -> `i64` + +*Return the position of the first occurrence of a string in another string. The first character of the string is at position 1. If no occurrence is found, 0 is returned. +The `case_sensitivity` option applies to the `substring` argument.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • + +
    + +### regexp_strpos + + +Implementations: +regexp_strpos(`input`, `pattern`, `position`, `occurrence`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `return_type` +0. regexp_strpos(`varchar`, `varchar`, `i64`, `i64`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `i64` +1. regexp_strpos(`string`, `string`, `i64`, `i64`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `i64` + +*Return the position of an occurrence of the given regular expression pattern in a string. The first character of the string is at position 1. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the `position` argument. Specifying `1` means to search for matches starting at the first character of the input string, `2` means the second character, and so on. The `position` argument should be a positive non-zero integer. Which occurrence to return the position of is specified using the `occurrence` argument. Specifying `1` means the position first occurrence will be returned, `2` means the position of the second occurrence, and so on. The `occurrence` argument should be a positive non-zero integer. If no occurrence is found, 0 is returned. +The `case_sensitivity` option specifies case-sensitive or case-insensitive matching. Enabling the `multiline` option will treat the input string as multiple lines. This makes the `^` and `$` characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the `dotall` option makes the `.` character match line terminator characters in a string. +Behavior is undefined if the regex fails to compile, the occurrence value is out of range, or the position value is out of range.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • +
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • +
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • + +
    + +### count_substring + + +Implementations: +count_substring(`input`, `substring`, `option:case_sensitivity`): -> `return_type` +
  • input: The input string.
  • +
  • substring: The substring to count.
  • +0. count_substring(`string`, `string`, `option:case_sensitivity`): -> `i64` +1. count_substring(`varchar`, `varchar`, `option:case_sensitivity`): -> `i64` +2. count_substring(`fixedchar`, `fixedchar`, `option:case_sensitivity`): -> `i64` + +*Return the number of non-overlapping occurrences of a substring in an input string. +The `case_sensitivity` option applies to the `substring` argument.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • + +
    + +### regexp_count_substring + + +Implementations: +regexp_count_substring(`input`, `pattern`, `position`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `return_type` +0. regexp_count_substring(`string`, `string`, `i64`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `i64` +1. regexp_count_substring(`varchar`, `varchar`, `i64`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `i64` +2. regexp_count_substring(`fixedchar`, `fixedchar`, `i64`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `i64` + +*Return the number of non-overlapping occurrences of a regular expression pattern in an input string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the `position` argument. Specifying `1` means to search for matches starting at the first character of the input string, `2` means the second character, and so on. The `position` argument should be a positive non-zero integer. +The `case_sensitivity` option specifies case-sensitive or case-insensitive matching. Enabling the `multiline` option will treat the input string as multiple lines. This makes the `^` and `$` characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the `dotall` option makes the `.` character match line terminator characters in a string. +Behavior is undefined if the regex fails to compile or the position value is out of range.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • +
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • +
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • + +
    + +### regexp_count_substring + + +Implementations: +regexp_count_substring(`input`, `pattern`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `return_type` +0. regexp_count_substring(`string`, `string`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `i64` + +*Return the number of non-overlapping occurrences of a regular expression pattern in an input string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The match starts at the first character of the input string. +The `case_sensitivity` option specifies case-sensitive or case-insensitive matching. Enabling the `multiline` option will treat the input string as multiple lines. This makes the `^` and `$` characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the `dotall` option makes the `.` character match line terminator characters in a string. +Behavior is undefined if the regex fails to compile.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • +
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • +
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • + +
    + +### replace + + +Implementations: +replace(`input`, `substring`, `replacement`, `option:case_sensitivity`): -> `return_type` +
  • input: Input string.
  • +
  • substring: The substring to replace.
  • +
  • replacement: The replacement string.
  • +0. replace(`string`, `string`, `string`, `option:case_sensitivity`): -> `string` +1. replace(`varchar`, `varchar`, `varchar`, `option:case_sensitivity`): -> `varchar` + +*Replace all occurrences of the substring with the replacement string. +The `case_sensitivity` option applies to the `substring` argument.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • + +
    + +### concat_ws + + +Implementations: +concat_ws(`separator`, `string_arguments`): -> `return_type` +
  • separator: Character to separate strings by.
  • +
  • string_arguments: Strings to be concatenated.
  • +0. concat_ws(`string`, `string`): -> `string` +1. concat_ws(`varchar`, `varchar`): -> `varchar` + +*Concatenate strings together separated by a separator.* +### repeat + + +Implementations: +repeat(`input`, `count`): -> `return_type` +0. repeat(`string`, `i64`): -> `string` +1. repeat(`varchar`, `i64`, `i64`): -> `varchar` + +*Repeat a string `count` number of times.* +### reverse + + +Implementations: +reverse(`input`): -> `return_type` +0. reverse(`string`): -> `string` +1. reverse(`varchar`): -> `varchar` +2. reverse(`fixedchar`): -> `fixedchar` + +*Returns the string in reverse order.* +### replace_slice + + +Implementations: +replace_slice(`input`, `start`, `length`, `replacement`): -> `return_type` +
  • input: Input string.
  • +
  • start: The position in the string to start deleting/inserting characters.
  • +
  • length: The number of characters to delete from the input string.
  • +
  • replacement: The new string to insert at the start position.
  • +0. replace_slice(`string`, `i64`, `i64`, `string`): -> `string` +1. replace_slice(`varchar`, `i64`, `i64`, `varchar`): -> `varchar` + +*Replace a slice of the input string. A specified 'length' of characters will be deleted from the input string beginning at the 'start' position and will be replaced by a new string. A start value of 1 indicates the first character of the input string. If start is negative or zero, or greater than the length of the input string, a null string is returned. If 'length' is negative, a null string is returned. If 'length' is zero, inserting of the new string occurs at the specified 'start' position and no characters are deleted. If 'length' is greater than the input string, deletion will occur up to the last character of the input string.* +### lower + + +Implementations: +lower(`input`, `option:char_set`): -> `return_type` +0. lower(`string`, `option:char_set`): -> `string` +1. lower(`varchar`, `option:char_set`): -> `varchar` +2. lower(`fixedchar`, `option:char_set`): -> `fixedchar` + +*Transform the string to lower case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.* + +
    Options: + +
  • char_set ['UTF8', 'ASCII_ONLY']
  • + +
    + +### upper + + +Implementations: +upper(`input`, `option:char_set`): -> `return_type` +0. upper(`string`, `option:char_set`): -> `string` +1. upper(`varchar`, `option:char_set`): -> `varchar` +2. upper(`fixedchar`, `option:char_set`): -> `fixedchar` + +*Transform the string to upper case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.* + +
    Options: + +
  • char_set ['UTF8', 'ASCII_ONLY']
  • + +
    + +### swapcase + + +Implementations: +swapcase(`input`, `option:char_set`): -> `return_type` +0. swapcase(`string`, `option:char_set`): -> `string` +1. swapcase(`varchar`, `option:char_set`): -> `varchar` +2. swapcase(`fixedchar`, `option:char_set`): -> `fixedchar` + +*Transform the string's lowercase characters to uppercase and uppercase characters to lowercase. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.* + +
    Options: + +
  • char_set ['UTF8', 'ASCII_ONLY']
  • + +
    + +### capitalize + + +Implementations: +capitalize(`input`, `option:char_set`): -> `return_type` +0. capitalize(`string`, `option:char_set`): -> `string` +1. capitalize(`varchar`, `option:char_set`): -> `varchar` +2. capitalize(`fixedchar`, `option:char_set`): -> `fixedchar` + +*Capitalize the first character of the input string. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.* + +
    Options: + +
  • char_set ['UTF8', 'ASCII_ONLY']
  • + +
    + +### title + + +Implementations: +title(`input`, `option:char_set`): -> `return_type` +0. title(`string`, `option:char_set`): -> `string` +1. title(`varchar`, `option:char_set`): -> `varchar` +2. title(`fixedchar`, `option:char_set`): -> `fixedchar` + +*Converts the input string into titlecase. Capitalize the first character of each word in the input string except for articles (a, an, the). Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.* + +
    Options: + +
  • char_set ['UTF8', 'ASCII_ONLY']
  • + +
    + +### initcap + + +Implementations: +initcap(`input`, `option:char_set`): -> `return_type` +0. initcap(`string`, `option:char_set`): -> `string` +1. initcap(`varchar`, `option:char_set`): -> `varchar` +2. initcap(`fixedchar`, `option:char_set`): -> `fixedchar` + +*Capitalizes the first character of each word in the input string, including articles, and lowercases the rest. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.* + +
    Options: + +
  • char_set ['UTF8', 'ASCII_ONLY']
  • + +
    + +### char_length + + +Implementations: +char_length(`input`): -> `return_type` +0. char_length(`string`): -> `i64` +1. char_length(`varchar`): -> `i64` +2. char_length(`fixedchar`): -> `i64` + +*Return the number of characters in the input string. The length includes trailing spaces.* +### bit_length + + +Implementations: +bit_length(`input`): -> `return_type` +0. bit_length(`string`): -> `i64` +1. bit_length(`varchar`): -> `i64` +2. bit_length(`fixedchar`): -> `i64` + +*Return the number of bits in the input string.* +### octet_length + + +Implementations: +octet_length(`input`): -> `return_type` +0. octet_length(`string`): -> `i64` +1. octet_length(`varchar`): -> `i64` +2. octet_length(`fixedchar`): -> `i64` + +*Return the number of bytes in the input string.* +### regexp_replace + + +Implementations: +regexp_replace(`input`, `pattern`, `replacement`, `position`, `occurrence`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `return_type` +
  • input: The input string.
  • +
  • pattern: The regular expression to search for within the input string.
  • +
  • replacement: The replacement string.
  • +
  • position: The position to start the search.
  • +
  • occurrence: Which occurrence of the match to replace.
  • +0. regexp_replace(`string`, `string`, `string`, `i64`, `i64`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `string` +1. regexp_replace(`varchar`, `varchar`, `varchar`, `i64`, `i64`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `varchar` + +*Search a string for a substring that matches a given regular expression pattern and replace it with a replacement string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github .io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be replaced is specified using the `occurrence` argument. Specifying `1` means only the first occurrence will be replaced, `2` means the second occurrence, and so on. Specifying `0` means all occurrences will be replaced. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the `position` argument. Specifying `1` means to search for matches starting at the first character of the input string, `2` means the second character, and so on. The `position` argument should be a positive non-zero integer. The replacement string can capture groups using numbered backreferences. +The `case_sensitivity` option specifies case-sensitive or case-insensitive matching. Enabling the `multiline` option will treat the input string as multiple lines. This makes the `^` and `$` characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the `dotall` option makes the `.` character match line terminator characters in a string. +Behavior is undefined if the regex fails to compile, the replacement contains an illegal back-reference, the occurrence value is out of range, or the position value is out of range.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • +
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • +
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • + +
    + +### regexp_replace + + +Implementations: +regexp_replace(`input`, `pattern`, `replacement`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `return_type` +
  • input: The input string.
  • +
  • pattern: The regular expression to search for within the input string.
  • +
  • replacement: The replacement string.
  • +0. regexp_replace(`string`, `string`, `string`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `string` + +*Search a string for a substring that matches a given regular expression pattern and replace it with a replacement string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github .io/icu/userguide/strings/regexp.html). The replacement string can capture groups using numbered backreferences. All occurrences of the pattern will be replaced. The search for matches start at the first character of the input. +The `case_sensitivity` option specifies case-sensitive or case-insensitive matching. Enabling the `multiline` option will treat the input string as multiple lines. This makes the `^` and `$` characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the `dotall` option makes the `.` character match line terminator characters in a string. +Behavior is undefined if the regex fails to compile or the replacement contains an illegal back-reference.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • +
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • +
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • + +
    + +### ltrim + + +Implementations: +ltrim(`input`, `characters`): -> `return_type` +
  • input: The string to remove characters from.
  • +
  • characters: The set of characters to remove.
  • +0. ltrim(`varchar`, `varchar`): -> `varchar` +1. ltrim(`string`, `string`): -> `string` + +*Remove any occurrence of the characters from the left side of the string. If no characters are specified, spaces are removed.* +### rtrim + + +Implementations: +rtrim(`input`, `characters`): -> `return_type` +
  • input: The string to remove characters from.
  • +
  • characters: The set of characters to remove.
  • +0. rtrim(`varchar`, `varchar`): -> `varchar` +1. rtrim(`string`, `string`): -> `string` + +*Remove any occurrence of the characters from the right side of the string. If no characters are specified, spaces are removed.* +### trim + + +Implementations: +trim(`input`, `characters`): -> `return_type` +
  • input: The string to remove characters from.
  • +
  • characters: The set of characters to remove.
  • +0. trim(`varchar`, `varchar`): -> `varchar` +1. trim(`string`, `string`): -> `string` + +*Remove any occurrence of the characters from the left and right sides of the string. If no characters are specified, spaces are removed.* +### lpad + + +Implementations: +lpad(`input`, `length`, `characters`): -> `return_type` +
  • input: The string to pad.
  • +
  • length: The length of the output string.
  • +
  • characters: The string of characters to use for padding.
  • +0. lpad(`varchar`, `i32`, `varchar`): -> `varchar` +1. lpad(`string`, `i32`, `string`): -> `string` + +*Left-pad the input string with the string of 'characters' until the specified length of the string has been reached. If the input string is longer than 'length', remove characters from the right-side to shorten it to 'length' characters. If the string of 'characters' is longer than the remaining 'length' needed to be filled, only pad until 'length' has been reached. If 'characters' is not specified, the default value is a single space.* +### rpad + + +Implementations: +rpad(`input`, `length`, `characters`): -> `return_type` +
  • input: The string to pad.
  • +
  • length: The length of the output string.
  • +
  • characters: The string of characters to use for padding.
  • +0. rpad(`varchar`, `i32`, `varchar`): -> `varchar` +1. rpad(`string`, `i32`, `string`): -> `string` + +*Right-pad the input string with the string of 'characters' until the specified length of the string has been reached. If the input string is longer than 'length', remove characters from the left-side to shorten it to 'length' characters. If the string of 'characters' is longer than the remaining 'length' needed to be filled, only pad until 'length' has been reached. If 'characters' is not specified, the default value is a single space.* +### center + + +Implementations: +center(`input`, `length`, `character`, `option:padding`): -> `return_type` +
  • input: The string to pad.
  • +
  • length: The length of the output string.
  • +
  • character: The character to use for padding.
  • +0. center(`varchar`, `i32`, `varchar`, `option:padding`): -> `varchar` +1. center(`string`, `i32`, `string`, `option:padding`): -> `string` + +*Center the input string by padding the sides with a single `character` until the specified `length` of the string has been reached. By default, if the `length` will be reached with an uneven number of padding, the extra padding will be applied to the right side. The side with extra padding can be controlled with the `padding` option. +Behavior is undefined if the number of characters passed to the `character` argument is not 1.* + +
    Options: + +
  • padding ['RIGHT', 'LEFT']
  • + +
    + +### left + + +Implementations: +left(`input`, `count`): -> `return_type` +0. left(`varchar`, `i32`): -> `varchar` +1. left(`string`, `i32`): -> `string` + +*Extract `count` characters starting from the left of the string.* +### right + + +Implementations: +right(`input`, `count`): -> `return_type` +0. right(`varchar`, `i32`): -> `varchar` +1. right(`string`, `i32`): -> `string` + +*Extract `count` characters starting from the right of the string.* +### string_split + + +Implementations: +string_split(`input`, `separator`): -> `return_type` +
  • input: The input string.
  • +
  • separator: A character used for splitting the string.
  • +0. string_split(`varchar`, `varchar`): -> `List>` +1. string_split(`string`, `string`): -> `List` + +*Split a string into a list of strings, based on a specified `separator` character.* +### regexp_string_split + + +Implementations: +regexp_string_split(`input`, `pattern`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `return_type` +
  • input: The input string.
  • +
  • pattern: The regular expression to search for within the input string.
  • +0. regexp_string_split(`varchar`, `varchar`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `List>` +1. regexp_string_split(`string`, `string`, `option:case_sensitivity`, `option:multiline`, `option:dotall`): -> `List` + +*Split a string into a list of strings, based on a regular expression pattern. The substrings matched by the pattern will be used as the separators to split the input string and will not be included in the resulting list. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). +The `case_sensitivity` option specifies case-sensitive or case-insensitive matching. Enabling the `multiline` option will treat the input string as multiple lines. This makes the `^` and `$` characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the `dotall` option makes the `.` character match line terminator characters in a string.* + +
    Options: + +
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • +
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • +
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • + +
    + +## Aggregate Functions + +### string_agg + + +Implementations: +string_agg(`input`, `separator`): -> `return_type` +
  • input: Column of string values.
  • +
  • separator: Separator for concatenated strings
  • +0. string_agg(`string`, `string`): -> `string` + +*Concatenates a column of string values with a separator.* \ No newline at end of file