Flink: If IcebergSink writeParallelism is not specified, defaults to the input source parallelism#13260
Conversation
12447ac to
e403b98
Compare
flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergSink.java
Outdated
Show resolved
Hide resolved
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java
Outdated
Show resolved
Hide resolved
ba49b0f to
08e3fab
Compare
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergSink.java
Outdated
Show resolved
Hide resolved
flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergSink.java
Outdated
Show resolved
Hide resolved
|
|
||
| // since the sink write parallelism was null, it asserts that the default parallelism used was | ||
| // the input source parallelism | ||
| assertThat(sink.getTransformation().getParallelism()).isEqualTo(dataStream.getParallelism()); |
There was a problem hiding this comment.
sink has multi-stage DAG. does sink.getTransformation get the writer operator? or writer parallelism is always the same as the transformation parallelism?
There was a problem hiding this comment.
Yes, I did a debugging and could confirm it.
There was a problem hiding this comment.
please add this as a code comment to help future readers.
| .tableLoader(tableLoader) | ||
| .tableSchema(SimpleDataUtil.FLINK_SCHEMA) | ||
| .distributionMode(DistributionMode.NONE) | ||
| .writeParallelism(parallelism) |
There was a problem hiding this comment.
this parallelism could be the same as the input stream parallelism. we need to set the parallelism to be differnt as the input stream parallelism
There was a problem hiding this comment.
the inputStream parallelism is always 1 and cannot easily be changed (it is a non-parallel source). So when the parallelism test template parameter is 2, the test is asserting that the writeParallelism is actually 2 (and not 1 as the parallelism of the inputSource)
There was a problem hiding this comment.
please add this as a code comment to help future readers.
stevenzwu
left a comment
There was a problem hiding this comment.
LGTM. requested code comments
c05c58b to
01b2f5f
Compare
01b2f5f to
dbe789f
Compare
|
thanks @rodmeneses for the contribution and @gyfora and @mxm for the review |
|
thanks @stevenzwu for merging. The backport one is coming soon |
…the input source parallelism (apache#13260)
* Flink: Backports apache#13260 to Flink 1.19 * Flink: Backports apache#13260 to Flink 1.20
…the input source parallelism (apache#13260)
* Flink: Backports apache#13260 to Flink 1.19 * Flink: Backports apache#13260 to Flink 1.20
Currently, if the writeParallelism is not specified, the IcebergSink will default to use the job parallelism.
Instead, we should default to the inputSource parallelism, to promote chaining.
This PR tracks that change, consequently bringing parity with the FlinkSink.
re: #12071 (comment)
cc: @stevenzwu @mxm @pvary @gyfora