Skip to content

Conversation

@jen6
Copy link

@jen6 jen6 commented Oct 2, 2025

Before starting native Flink support, I tested Beam’s FlinkRunner and it worked well.
It cannot be run with exec:java due to Flink dependency conflicts. However, it runs correctly on a Flink cluster.

Test Result

image

Copy link
Collaborator

@RamSaw RamSaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for such an awesome contribution!
It's already a lot of cool information and improvements, happy to merge it.

Do you mind if you have capacity also updating our CI to run these examples bazel and maven. That way we will know that it always works and we don't break it.

<scope>runtime</scope>
</dependency>
</dependencies>
<build>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we really need this <build> section for shade plugin. Other runners don't have it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep the Shade plugin here because Flink runs user code on TaskManagers and needs the full dependency closure packaged.

For Flink we have three practical options:

  1. Uber-jar - This is the most repeatable path for submitting to an existing cluster. (recommended by flink official docs)
  2. Local exec (no shade) — for quick local checks we can run Beam’s FlinkRunner with exec:exec (as below). This works but doesn’t solve cluster submission.
  3. Attach external JARs at submit time — flink run -C file:///path/dep.jar … can inject deps without shading, but it is brittle and operationally heavier than a single uber-jar.

Given that we also want production-like testing against an existing Flink cluster, I suggest we keep the shade step in this module.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yes, it makes sense. Thank you for explanation!

We had the same issue for Spark on Dataproc (GCP) and we solved it in a slightly different way: command, pom.xml.

Will it work for Flink? That way we will have one way of building Uber jars for any backend.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use with assembly, but it didn't work 🤔
Shade plugin do not only packaging jar files but also excluding META-INF.
Also it recommended by the Flink official docs, so I would like to keep it

@jen6 jen6 requested a review from RamSaw October 9, 2025 14:35
Copy link
Collaborator

@RamSaw RamSaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I give approval in case your answers to my comments are negative, if not and you agree with making the proposed changes, then please send for another round of review.

],
main_class = "com.google.privacy.differentialprivacy.pipelinedp4j.examples.BeamExample",
runtime_deps = [
"@maven//:org_apache_beam_beam_runners_flink_1_18",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we can just merge it with previous java_binary target and add just another runtime_dep. Less code and since it is just example, should be fine to have unnecessary deps.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in ddf9636

<scope>runtime</scope>
</dependency>
</dependencies>
<build>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yes, it makes sense. Thank you for explanation!

We had the same issue for Spark on Dataproc (GCP) and we solved it in a slightly different way: command, pom.xml.

Will it work for Flink? That way we will have one way of building Uber jars for any backend.

@jen6 jen6 requested a review from RamSaw October 25, 2025 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants