Skip to content

Commit 2da6624

Browse files
kxu1026wmoustafaWalaa Eldin Moustafa
authored
Merge Update Trino patch link + Add explanation for required Trino SPI changes (#105)
* Update Trino patch link * Add explanation for required Trino SPI changes Co-authored-by: Walaa Eldin Moustafa <[email protected]> Co-authored-by: Walaa Eldin Moustafa <[email protected]>
1 parent f6b0818 commit 2da6624

File tree

2 files changed

+43
-1
lines changed

2 files changed

+43
-1
lines changed

docs/required-trino-apis.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Why is modifying the Trino SPI interface necessary for Transport to work?
2+
Transport requires applying this [patch](transport-udf-trino.patch) before being able to use Transport with Trino.
3+
This patch makes some of the internal UDF classes be visible at the SPI layer.
4+
Below we explain why some Transport APIs cannot leverage the APIs offered by the [public SPI UDF model](https://trino.io/docs/current/develop/functions.html).
5+
6+
## [init() method](https://github.com/linkedin/transport/blob/09a89508296a2491f43cc8866d47952c911313ab/transportable-udfs-api/src/main/java/com/linkedin/transport/api/udf/StdUDF.java#L45) is hard to implement on top of Trino-SPI
7+
The `init()` method allows users to perform necessary initializations for their Transport UDFs.
8+
Conceptually, it is called once at the UDF initialization time before processing any records. It sets the [StdFactory](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-api/src/main/java/com/linkedin/transport/api/StdFactory.java#L36) to be used by the
9+
`StdUDF`, and can be used to create Java types that correspond to the type signatures provided by the user.
10+
Due to the lack of a similar API in the SPI UDF model, in the current approach, `init()` is called inside
11+
overridden [specialize()](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L136) method in [StdUdfWrapper](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L72)
12+
which extends [SqlScalarFunction](https://github.com/trinodb/trino/blob/54d8154037dfe5f6f65709dbafeb92f5506af2ac/core/trino-main/src/main/java/io/trino/metadata/SqlScalarFunction.java#L18).
13+
That way, we can implement the
14+
semantics of init():
15+
16+
## [TrinoFactory](https://github.com/linkedin/transport/blob/92dfbbfd989367418bdd14f9ac4cc2bcf1e7c777/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/TrinoFactory.java#L52) requires `FunctionBinding` and `FunctionDependencies` which are not provided by the Trino-SPI
17+
[TrinoFactory](https://github.com/linkedin/transport/blob/92dfbbfd989367418bdd14f9ac4cc2bcf1e7c777/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/TrinoFactory.java#L52)
18+
is designed to convert Transport data types and their required operators (e.g., the equals function of map keys)
19+
to Trino native data type and operators. This serves implementing the
20+
[createStdType()](https://github.com/linkedin/transport/blob/92dfbbfd989367418bdd14f9ac4cc2bcf1e7c777/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/TrinoFactory.java#L139)
21+
in [StdFactory](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-api/src/main/java/com/linkedin/transport/api/StdFactory.java#L36), which is a standard
22+
API across all engines.
23+
The TrinoFactory factory implementaiton of the StdFactory requires Trino classes [FunctionBinding](https://github.com/trinodb/trino/blob/54d8154037dfe5f6f65709dbafeb92f5506af2ac/core/trino-main/src/main/java/io/trino/metadata/FunctionBinding.java#L26)
24+
and [FunctionDependencies](https://github.com/trinodb/trino/blob/0b1a1b9fa036bac132c80c990166096abc1b2552/core/trino-main/src/main/java/io/trino/metadata/FunctionDependencies.java#L47)
25+
to implement its basic functionality; however those classes are not provided by the Trino SPI UDF model.
26+
In the current integration approach, TrinoFactory is initialized inside the overridden [specialize()](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L136) method
27+
in [StdUdfWrapper](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L72)
28+
which extends [SqlScalarFunction](https://github.com/trinodb/trino/blob/54d8154037dfe5f6f65709dbafeb92f5506af2ac/core/trino-main/src/main/java/io/trino/metadata/SqlScalarFunction.java#L18)
29+
, and gets access to those two classes from there.
30+
31+
The snippet below shows how the Transport Trino implementation uses the `SqlScalarFunction#specialize()` method
32+
to call `StdUF#init()` and pass the `FunctionDependencies` and `FunctionBinding` objects to the TrinoFactory.
33+
```java
34+
@Override
35+
public ScalarFunctionImplementation specialize(FunctionBinding functionBinding, FunctionDependencies functionDependencies) {
36+
StdFactory stdFactory = new TrinoFactory(functionBinding, functionDependencies);
37+
StdUDF stdUDF = getStdUDF();
38+
stdUDF.init(stdFactory);
39+
...
40+
}
41+
```
42+

docs/using-transport-udfs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ If the UDF class is `com.linkedin.transport.example.ExampleUDF` then the platfor
8686
Unlike Hive and Spark, Trino currently does not allow dynamically loading jar files once the Trino server has started.
8787
In Trino, the jar is deployed to the `plugin` directory.
8888
However, a small patch is required for the Trino engine to recognize the jar as a plugin, since the generated Trino UDFs implement the `SqlScalarFunction` API, which is currently not part of Trino's SPI architecture.
89-
You can find the patch [here](transport-udfs-trino.patch) and apply it before deploying your UDFs jar to the Trino engine.
89+
You can find the patch [here](transport-udfs-trino.patch) and apply it before deploying your UDFs jar to the Trino engine ([Why is this patch needed?](required-trino-apis.md)).
9090
9191
2. Call the UDF in a query
9292
To call the UDF, you will need to use the function name defined in the Transport UDF definition.

0 commit comments

Comments
 (0)