Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
da82934
Upgrade to Gradle 6.7 (#67)
akshayrai Apr 15, 2021
dd6e2a7
Support builds with platform specific JDK (#69)
akshayrai Apr 26, 2021
1773505
Bump Avro dependency to 1.10.2 (from 1.7.7). (#71)
srramach Apr 29, 2021
92dfbbf
Migrate from PrestoSQL to Trino (#68)
akshayrai May 6, 2021
6974248
Automate artifact publication to Maven Central (#72)
rzhang10 May 12, 2021
7ed5d9c
Update ci.yml java version to 8 (#77)
rzhang10 May 13, 2021
2163ec2
Fix org.pentaho:pentaho-aggdesigner-algorithm sunset problem (#78)
kxu1026 Jun 17, 2021
7eff396
Remove travis build in favor of github actions (#87)
HotSushi Aug 5, 2021
8be2838
Add scala_2.11 and scala_2.12 support (#85)
HotSushi Aug 6, 2021
0211810
WIP: Rebase on master branch
wmoustafa Nov 30, 2020
cbda011
Eliminate further StdXXX primitive references and fix some tests
wmoustafa Jan 4, 2020
49db345
Remove further StdData references
wmoustafa Jan 4, 2020
934074d
Address review comments
wmoustafa May 10, 2020
a53d7bf
WIP: Rebase on mater - continue
wmoustafa Nov 30, 2020
48c6c29
Introduce type-parameterized array and map APIs; replace wrapper prim…
wmoustafa Dec 17, 2019
cf95f35
Address review comments
wmoustafa May 10, 2020
2bafcce
Fix build errors
wmoustafa Feb 8, 2021
73feee0
Missed conflicts and fix test issues
maluchari Aug 12, 2021
a8dbf65
Update ci.yml to also build the udf-examples folder (#90)
maluchari Aug 20, 2021
658a893
Fix running multiple builds in run step in workflow action (#92)
maluchari Aug 20, 2021
a906f22
Address review comments
maluchari Aug 23, 2021
591b991
Address review comments
maluchari Aug 23, 2021
d919f96
A solution to fix running multiple UDFs in Spark issue (#93)
kxu1026 Aug 25, 2021
a5633a3
Merge branch 'linkedin:master' into malini_rebase_api
maluchari Oct 6, 2021
c58a2d8
Add explanation for required Trino SPI changes
Nov 2, 2021
880b116
Update Trino patch link
wmoustafa Nov 2, 2021
639330d
Merge branch 'linkedin:master' into malini_rebase_api
maluchari Nov 11, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#
# CI build that assembles artifacts and runs tests.
# If validation is successful this workflow releases from the main dev branch.
#
# - skipping CI: add [skip ci] to the commit message
# - skipping release: add [skip release] to the commit message
#
name: CI

on:
push:
branches: ['master']
tags-ignore: [v*] # release tags are autogenerated after a successful CI, no need to run CI against them
pull_request:
branches: ['**']

jobs:

build:
runs-on: ubuntu-latest
if: "! contains(toJSON(github.event.commits.*.message), '[skip ci]')"

steps:

- name: 1. Check out code
uses: actions/checkout@v2 # https://github.com/actions/checkout
with:
fetch-depth: '0' # https://github.com/shipkit/shipkit-changelog#fetch-depth-on-ci

- name: 2. Setup Java JDK
uses: actions/setup-java@v2
with:
distribution: 'adopt'
java-version: '8'

- name: 3. Perform build
run: |
./gradlew build
./gradlew -p transportable-udfs-examples clean build -s

- name: 4. Perform release
# Release job, only for pushes to the main development branch
if: github.event_name == 'push'
&& github.ref == 'refs/heads/master'
&& github.repository == 'linkedin/transport'
&& !contains(toJSON(github.event.commits.*.message), '[skip release]')

run: ./gradlew githubRelease publishToSonatype closeAndReleaseStagingRepository
env:
GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
SONATYPE_USER: ${{secrets.SONATYPE_USER}}
SONATYPE_PWD: ${{secrets.SONATYPE_PWD}}
PGP_KEY: ${{secrets.PGP_KEY}}
PGP_PWD: ${{secrets.PGP_PWD}}
26 changes: 0 additions & 26 deletions .travis.yml

This file was deleted.

18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
**Transport** is a framework for writing performant user-defined
functions (UDFs) that are portable across a variety of engines
including [Apache Spark](https://spark.apache.org/), [Apache Hive](https://hive.apache.org/), and
[Presto](https://prestodb.io/). Transport UDFs are also
[Trino](https://trino.io/). Transport UDFs are also
capable of directly processing data stored in serialization formats such as
Apache Avro. With Transport, developers only need to implement their UDF
logic once using the Transport API. Transport then takes care of
translating the UDF to native UDF version targeted at various engines
or formats. Currently, Transport is capable of generating
engine-artifacts for Spark, Hive, and Presto, and format-artifacts for
engine-artifacts for Spark, Hive, and Trino, and format-artifacts for
Avro. Further details on Transport can be found in this [LinkedIn Engineering blog post](https://engineering.linkedin.com/blog/2018/11/using-translatable-portable-UDFs).

## Documentation
Expand Down Expand Up @@ -127,7 +127,7 @@ to familiarize yourself with the API, and how to write new UDFs.
to find out how to write UDF tests in a unified testing API, but have the framework test them on multiple platforms.

* Root [`build.gradle`](transportable-udfs-examples/build.gradle) file
to find out how to apply the `transport` plugin, which enables generating Hive, Spark, and Presto UDFs out of
to find out how to apply the `transport` plugin, which enables generating Hive, Spark, and Trino UDFs out of
the transportable UDFs you define once you build your project. To see that in action:

Change directory to `transportable-udfs-examples`:
Expand All @@ -153,7 +153,7 @@ The results should be like:

```
transportable-udfs-example-udfs-hive.jar
transportable-udfs-example-udfs-presto.jar
transportable-udfs-example-udfs-trino.jar
transportable-udfs-example-udfs-spark.jar
transportable-udfs-example-udfs.jar
```
Expand All @@ -162,13 +162,13 @@ That is it! While only one version of the UDFs is implemented, multiple jars are
Each of those jars uses native platform APIs and data models to implement the UDFs. So from an execution engine's perspective,
there is no data transformation needed for interoperability or portability. Only suitable classes are used for each engine.

To call those jars from your SQL engine (i.e., Hive, Spark, or Presto), the standard process for deploying UDF jars is followed
To call those jars from your SQL engine (i.e., Hive, Spark, or Trino), the standard process for deploying UDF jars is followed
for each engine. For example, in Hive, you add the jar to the classpath using the `ADD JAR` statement,
and register the UDF using `CREATE FUNCTION` statement.
In Presto, the jar is deployed to the `plugin` directory. However, a small patch is required for the Presto
engine to recognize the jar as a plugin, since the generated Presto UDFs implement the `SqlScalarFunction` API,
which is currently not part of Presto's SPI architecture. You can find the patch [here](transportable-udfs-documentation/transport-udfs-presto.patch) and apply it
before deploying your UDFs jar to the Presto engine.
In Trino, the jar is deployed to the `plugin` directory. However, a small patch is required for the Trino
engine to recognize the jar as a plugin, since the generated Trino UDFs implement the `SqlScalarFunction` API,
which is currently not part of Trino's SPI architecture. You can find the patch [here](docs/transport-udfs-trino.patch) and apply it
before deploying your UDFs jar to the Trino engine.

## Contributing
The project is under active development and we welcome contributions of different forms:
Expand Down
13 changes: 10 additions & 3 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,18 @@ buildscript {
classpath 'com.github.jengelman.gradle.plugins:shadow:2.0.4'
classpath 'org.github.ngbinh.scalastyle:gradle-scalastyle-plugin_2.11:1.0.1'
classpath 'gradle.plugin.nl.javadude.gradle.plugins:license-gradle-plugin:0.14.0'
classpath "io.github.gradle-nexus:publish-plugin:1.0.0"
classpath "org.shipkit:shipkit-auto-version:1.1.1"
classpath "org.shipkit:shipkit-changelog:1.1.10"
}
}

plugins {
id "org.shipkit.java" version "2.3.4"
id "checkstyle"
}

apply from: "gradle/shipkit.gradle"

allprojects {
group = 'com.linkedin.transport'
apply plugin: 'idea'
Expand Down Expand Up @@ -74,8 +78,11 @@ subprojects {
}

checkstyle {
configFile = file("${rootDir}/gradle/checkstyle/checkstyle.xml")
configProperties = ['config_loc' : "${rootDir}/gradle/checkstyle/"]
configFile = rootProject.file('gradle/checkstyle/checkstyle.xml')
configProperties = [
'configDir': rootProject.file('gradle/checkstyle'),
'baseDir': rootDir
]
toolVersion '8.23'
}
}
Expand Down
7 changes: 4 additions & 3 deletions defaultEnvironment.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ subprojects {
url "https://conjars.org/repo"
}
}
project.ext.setProperty('presto-version', '333')
project.ext.setProperty('airlift-slice-version', '0.38')
project.ext.setProperty('trino-version', '352')
project.ext.setProperty('airlift-slice-version', '0.39')
project.ext.setProperty('spark-group', 'org.apache.spark')
project.ext.setProperty('spark-version', '2.3.0')
project.ext.setProperty('spark2-version', '2.3.0')
project.ext.setProperty('spark3-version', '3.1.1')
}
42 changes: 42 additions & 0 deletions docs/required-trino-apis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Why is modifying the Trino SPI interface necessary for Transport to work?
Transport requires applying this [patch](transport-udfs-trino.patch) before being able to use Transport with Trino.
This patch makes some of the internal UDF classes be visible at the SPI layer.
Below we explain why some Transport APIs cannot leverage the APIs offered by the [public SPI UDF model](https://trino.io/docs/current/develop/functions.html).

## [init() method](https://github.com/linkedin/transport/blob/09a89508296a2491f43cc8866d47952c911313ab/transportable-udfs-api/src/main/java/com/linkedin/transport/api/udf/StdUDF.java#L45) is hard to implement on top of Trino-SPI
The `init()` method allows users to perform necessary initializations for their Transport UDFs.
Conceptually, it is called once at the UDF initialization time before processing any records. It sets the [StdFactory](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-api/src/main/java/com/linkedin/transport/api/StdFactory.java#L36) to be used by the
`StdUDF`, and can be used to create Java types that correspond to the type signatures provided by the user.
Due to the lack of a similar API in the SPI UDF model, in the current approach, `init()` is called inside
overridden [specialize()](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L136) method in [StdUdfWrapper](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L72)
which extends [SqlScalarFunction](https://github.com/trinodb/trino/blob/54d8154037dfe5f6f65709dbafeb92f5506af2ac/core/trino-main/src/main/java/io/trino/metadata/SqlScalarFunction.java#L18).
That way, we can implement the
semantics of init():

## [TrinoFactory](https://github.com/linkedin/transport/blob/92dfbbfd989367418bdd14f9ac4cc2bcf1e7c777/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/TrinoFactory.java#L52) requires `FunctionBinding` and `FunctionDependencies` which are not provided by the Trino-SPI
[TrinoFactory](https://github.com/linkedin/transport/blob/92dfbbfd989367418bdd14f9ac4cc2bcf1e7c777/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/TrinoFactory.java#L52)
is designed to convert Transport data types and their required operators (e.g., the equals function of map keys)
to Trino native data type and operators. This serves implementing the
[createStdType()](https://github.com/linkedin/transport/blob/92dfbbfd989367418bdd14f9ac4cc2bcf1e7c777/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/TrinoFactory.java#L139)
in [StdFactory](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-api/src/main/java/com/linkedin/transport/api/StdFactory.java#L36), which is a standard
API across all engines.
The TrinoFactory factory implementaiton of the StdFactory requires Trino classes [FunctionBinding](https://github.com/trinodb/trino/blob/54d8154037dfe5f6f65709dbafeb92f5506af2ac/core/trino-main/src/main/java/io/trino/metadata/FunctionBinding.java#L26)
and [FunctionDependencies](https://github.com/trinodb/trino/blob/0b1a1b9fa036bac132c80c990166096abc1b2552/core/trino-main/src/main/java/io/trino/metadata/FunctionDependencies.java#L47)
to implement its basic functionality; however those classes are not provided by the Trino SPI UDF model.
In the current integration approach, TrinoFactory is initialized inside the overridden [specialize()](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L136) method
in [StdUdfWrapper](https://github.com/linkedin/transport/blob/d919f96dc1485ccb8b58e4faed3a5589a5966236/transportable-udfs-trino/src/main/java/com/linkedin/transport/trino/StdUdfWrapper.java#L72)
which extends [SqlScalarFunction](https://github.com/trinodb/trino/blob/54d8154037dfe5f6f65709dbafeb92f5506af2ac/core/trino-main/src/main/java/io/trino/metadata/SqlScalarFunction.java#L18)
, and gets access to those two classes from there.

The snippet below shows how the Transport Trino implementation uses the `SqlScalarFunction#specialize()` method
to call `StdUF#init()` and pass the `FunctionDependencies` and `FunctionBinding` objects to the TrinoFactory.
```java
@Override
public ScalarFunctionImplementation specialize(FunctionBinding functionBinding, FunctionDependencies functionDependencies) {
StdFactory stdFactory = new TrinoFactory(functionBinding, functionDependencies);
StdUDF stdUDF = getStdUDF();
stdUDF.init(stdFactory);
...
}
```

Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
diff --git a/presto-main/src/main/java/io/prestosql/server/PluginManager.java b/presto-main/src/main/java/io/prestosql/server/PluginManager.java
index abcd001031..053c17aeed 100644
--- a/presto-main/src/main/java/io/prestosql/server/PluginManager.java
+++ b/presto-main/src/main/java/io/prestosql/server/PluginManager.java
@@ -23,6 +23,7 @@ import io.prestosql.connector.ConnectorManager;
import io.prestosql.eventlistener.EventListenerManager;
import io.prestosql.execution.resourcegroups.ResourceGroupManager;
import io.prestosql.metadata.MetadataManager;
+import io.prestosql.metadata.SqlScalarFunction;
import io.prestosql.security.AccessControlManager;
import io.prestosql.security.GroupProviderManager;
import io.prestosql.server.security.PasswordAuthenticatorManager;
@@ -54,6 +55,7 @@ import java.util.ServiceLoader;
diff --git a/core/trino-main/src/main/java/io/trino/server/PluginManager.java b/core/trino-main/src/main/java/io/trino/server/PluginManager.java
index 76cc04ca9d..483e609c86 100644
--- a/core/trino-main/src/main/java/io/trino/server/PluginManager.java
+++ b/core/trino-main/src/main/java/io/trino/server/PluginManager.java
@@ -23,6 +23,7 @@ import io.trino.connector.ConnectorManager;
import io.trino.eventlistener.EventListenerManager;
import io.trino.execution.resourcegroups.ResourceGroupManager;
import io.trino.metadata.MetadataManager;
+import io.trino.metadata.SqlScalarFunction;
import io.trino.security.AccessControlManager;
import io.trino.security.GroupProviderManager;
import io.trino.server.security.CertificateAuthenticatorManager;
@@ -55,6 +56,7 @@ import java.util.ServiceLoader;
import java.util.Set;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.function.Supplier;
+import java.util.stream.Collectors;

import static com.google.common.base.Preconditions.checkState;
import static io.prestosql.metadata.FunctionExtractor.extractFunctions;
@@ -64,8 +66,22 @@ import static java.util.Objects.requireNonNull;
import static io.trino.metadata.FunctionExtractor.extractFunctions;
@@ -65,8 +67,27 @@ import static java.util.Objects.requireNonNull;
@ThreadSafe
public class PluginManager
{
Expand All @@ -29,19 +29,24 @@ index abcd001031..053c17aeed 100644
+ // as it is the case with vanilla plugins.
+ // JIRA: https://jira01.corp.linkedin.com:8443/browse/LIHADOOP-34269
private static final ImmutableList<String> SPI_PACKAGES = ImmutableList.<String>builder()
+ // io.prestosql.metadata is required for SqlScalarFunction and FunctionRegistry classes
+ .add("io.prestosql.metadata.")
+ // io.prestosql.operator. is required for ScalarFunctionImplementation and TypeSignatureParser
+ .add("io.prestosql.operator.")
.add("io.prestosql.spi.")
+ // io.prestosql.type is required for TypeManager, and all supported types
+ .add("io.prestosql.type.")
+ // io.prestosql.util is required for Reflection
+ .add("io.prestosql.util.")
+ // io.trino.metadata is required for SqlScalarFunction, Metadata, MetadataManager, FunctionBinding,
+ // FunctionDependencies, TypeVariableConstraint, FunctionArgumentDefinition, FunctionKind, FunctionMetadata,
+ // Signature and SignatureBinder classes
+ .add("io.trino.metadata.")
+ // io.trino.operator. is required for AbstractTestFunctions, ScalarFunctionImplementation
+ // & ChoicesScalarFunctionImplementation
+ .add("io.trino.operator.")
+ // io.trino.sql.analyzer.TypeSignatureTranslator. is required for parseTypeSignature
+ .add("io.trino.sql.analyzer.TypeSignatureTranslator.")
.add("io.trino.spi.")
+ // io.trino.type is required for UnknownType
+ .add("io.trino.type.")
+ // io.trino.util is required for Reflection
+ .add("io.trino.util.")
.add("com.fasterxml.jackson.annotation.")
.add("io.airlift.slice.")
.add("org.openjdk.jol.")
@@ -159,11 +175,22 @@ public class PluginManager
@@ -163,11 +184,26 @@ public class PluginManager
{
ServiceLoader<Plugin> serviceLoader = ServiceLoader.load(Plugin.class, pluginClassLoader);
List<Plugin> plugins = ImmutableList.copyOf(serviceLoader);
Expand Down
Loading