|
| 1 | +# Why is modifying the Trino SPI interface necessary for Transport to work? |
| 2 | +Transport requires applying this [patch](transport-udf-trino.patch) before being able to use Transport with Trino. |
| 3 | +This patch makes some of the internal UDF classes be visible at the SPI layer. |
| 4 | +Below we explain why some Transport APIs cannot leverage the APIs offered by the [public SPI UDF model](https://trino.io/docs/current/develop/functions.html). |
| 5 | + |
| 6 | +## [init() method](https://github.com/linkedin/transport/blob/09a89508296a2491f43cc8866d47952c911313ab/transportable-udfs-api/src/main/java/com/linkedin/transport/api/udf/StdUDF.java#L45) is hard to implement on top of Trino-SPI |
| 7 | +The `init()` method allows users to perform necessary initializations for their Transport UDFs. |
| 8 | +Conceptually, it is called once at the UDF initialization time before processing any records. It sets the [StdFactory](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-api/src/main/java/com/linkedin/transport/api/StdFactory.java#L36) to be used by the |
| 9 | +`StdUDF`, and can be used to create Java types that correspond to the type signatures provided by the user. |
| 10 | +Due to the lack of a similar API in the SPI UDF model, in the current approach, `init()` is called inside |
| 11 | +overridden [specialize()](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L136) method in [StdUdfWrapper](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L72) |
| 12 | +which extends [SqlScalarFunction](https://github.com/trinodb/trino/blob/54d8154037dfe5f6f65709dbafeb92f5506af2ac/core/trino-main/src/main/java/io/trino/metadata/SqlScalarFunction.java#L18). |
| 13 | +That way, we can implement the |
| 14 | + semantics of init(): |
| 15 | + |
| 16 | +## [TrinoFactory](https://github.com/linkedin/transport/blob/92dfbbfd989367418bdd14f9ac4cc2bcf1e7c777/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/TrinoFactory.java#L52) requires `FunctionBinding` and `FunctionDependencies` which are not provided by the Trino-SPI |
| 17 | +[TrinoFactory](https://github.com/linkedin/transport/blob/92dfbbfd989367418bdd14f9ac4cc2bcf1e7c777/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/TrinoFactory.java#L52) |
| 18 | +is designed to convert Transport data types and their required operators (e.g., the equals function of map keys) |
| 19 | +to Trino native data type and operators. This serves implementing the |
| 20 | + [createStdType()](https://github.com/linkedin/transport/blob/92dfbbfd989367418bdd14f9ac4cc2bcf1e7c777/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/TrinoFactory.java#L139) |
| 21 | +in [StdFactory](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-api/src/main/java/com/linkedin/transport/api/StdFactory.java#L36), which is a standard |
| 22 | +API across all engines. |
| 23 | +The TrinoFactory factory implementaiton of the StdFactory requires Trino classes [FunctionBinding](https://github.com/trinodb/trino/blob/54d8154037dfe5f6f65709dbafeb92f5506af2ac/core/trino-main/src/main/java/io/trino/metadata/FunctionBinding.java#L26) |
| 24 | +and [FunctionDependencies](https://github.com/trinodb/trino/blob/0b1a1b9fa036bac132c80c990166096abc1b2552/core/trino-main/src/main/java/io/trino/metadata/FunctionDependencies.java#L47) |
| 25 | +to implement its basic functionality; however those classes are not provided by the Trino SPI UDF model. |
| 26 | +In the current integration approach, TrinoFactory is initialized inside the overridden [specialize()](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L136) method |
| 27 | +in [StdUdfWrapper](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L72) |
| 28 | +which extends [SqlScalarFunction](https://github.com/trinodb/trino/blob/54d8154037dfe5f6f65709dbafeb92f5506af2ac/core/trino-main/src/main/java/io/trino/metadata/SqlScalarFunction.java#L18) |
| 29 | +, and gets access to those two classes from there. |
| 30 | + |
| 31 | +The snippet below shows how the Transport Trino implementation uses the `SqlScalarFunction#specialize()` method |
| 32 | +to call `StdUF#init()` and pass the `FunctionDependencies` and `FunctionBinding` objects to the TrinoFactory. |
| 33 | +```java |
| 34 | +@Override |
| 35 | +public ScalarFunctionImplementation specialize(FunctionBinding functionBinding, FunctionDependencies functionDependencies) { |
| 36 | + StdFactory stdFactory = new TrinoFactory(functionBinding, functionDependencies); |
| 37 | + StdUDF stdUDF = getStdUDF(); |
| 38 | + stdUDF.init(stdFactory); |
| 39 | + ... |
| 40 | +} |
| 41 | +``` |
| 42 | + |
0 commit comments