Skip to content

[WIP] Build jar from source #50790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

[WIP] Build jar from source #50790

wants to merge 2 commits into from

Conversation

vrozov
Copy link
Member

@vrozov vrozov commented May 5, 2025

What changes were proposed in this pull request?

Remove hive-test-udfs.jar from the source control, add java sources for Hive UDFs and build hive-test-udfs.jar as part of HiveUDFDynamicLoadSuite using Maven.

Why are the changes needed?

  • Building UDFs during testing using the same Hive version as what Spark uses guaranties correctness of UDFs
  • It is against ASF policy to have jar files in the source release.

Does this PR introduce any user-facing change?

No, it impacts tests only

How was this patch tested?

It was tested using maven build and running HiveUDFDynamicLoadSuite.

Was this patch authored or co-authored using generative AI tooling?

No

@vrozov
Copy link
Member Author

vrozov commented May 5, 2025

@HeartSaVioR, @dongjoon-hyun, @HyukjinKwon, @cloud-fan Please review. Similar approach can be applied to other jars.

import org.apache.hadoop.io.Text;

@Description(name = "udaf_max2", value = "_FUNC_(expr) - Returns the maximum value of expr")
public class UDAFExampleMax2 extends UDAF {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we get the source code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see

. The code is modified to compile with Hive 2.3.10.

@@ -199,6 +199,10 @@
<artifactId>scalacheck_${scala.binary.version}</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.maven.shared</groupId>
<artifactId>maven-invoker</artifactId>
Copy link
Member

@pan3793 pan3793 May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems too overkill to use Maven here, how about calling javac directly? https://github.com/apache/hive/blob/branch-4.0/common/src/java/org/apache/hive/common/util/HiveTestUtils.java#L94

Some additional backgrounds: there were some complains that Spark use two building tools, Maven and SBT, which introduces additional complex to align those behaviors, there is a chance that one of the building tools might be dropped someday.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary to build jar and not only compile Java class. Calling javac directly would require all the dependencies and without Maven it would be necessary to load them.

Using maven invoker here only impacts how test jar is built. It works both with Spark Maven and SBT builds and does not require usage of one over the other to build Spark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants