[#1824] feat(spark): Support map side combine of shuffle writer #1825

wForget · 2024-06-24T10:53:13Z

What changes were proposed in this pull request?

Support map side combine of shuffle writer

Why are the changes needed?

Fix: #1824

Does this PR introduce any user-facing change?

Yes, support new shuffle writer behavior.

How was this patch tested?

Added integration test

github-actions · 2024-06-24T11:20:44Z

Test Results

2 657 files +16 2 657 suites +16 5h 30m 44s ⏱️ + 2m 34s
946 tests + 2 945 ✅ + 3 1 💤 ±0 0 ❌ - 1
11 789 runs +16 11 774 ✅ +17 15 💤 ±0 0 ❌ - 1

Results for commit d48ba34. ± Comparison against base commit 1482804.

♻️ This comment has been updated with latest results.

zuston · 2024-06-25T02:25:28Z

client-spark/spark3/src/main/java/org/apache/spark/shuffle/writer/RssShuffleWriter.java

    if (isCombine) {
-      createCombiner = shuffleDependency.aggregator().get().createCombiner();
+      if (RssSparkConfig.toRssConf(sparkConf).get(RSS_CLIENT_MAP_SIDE_COMBINE_ENABLED)) {
+        iterator = shuffleDependency.aggregator().get().combineValuesByKey(records, taskContext);


Do we need to check the existence of shuffleDependency.aggregator() ?

BTW, will this map combine cause the disk/memory burden , especially on the tight disk space on k8s?

Do we need to check the existence of shuffleDependency.aggregator() ?

It seems not, there is a relevant check in ShuffleDependency.

https://github.com/apache/spark/blob/2ac2710b46be70064cd7286a9c86deb1ddc979cb/core/src/main/scala/org/apache/spark/Dependency.scala#L89-L91

Good to know this.

BTW, will this map combine cause the disk/memory burden , especially on the tight disk space on k8s?

I think it is possible when there are spills. So I added a configuration to control whether it is enabled.

Yeh. I got your point, just an extra attention.

For having this point implicitly, let's make this shown in this config description.

advancedxy · 2024-06-25T02:36:40Z

integration-test/spark3/src/test/java/org/apache/uniffle/test/MapSideCombineTest.java

+    WriteAndReadMetricsSparkListener listener = new WriteAndReadMetricsSparkListener();
+    spark.sparkContext().addSparkListener(listener);
+
+    Thread.sleep(4000);


hmmm, why so large sleep time?
It's not the problem of this pr, but I'm not a big fan of using sleep directly in the test. Could we replace it with something likes semaphore or similar.

Copy from https://github.com/apache/incubator-uniffle/blob/a16184aa2994fced4264698b894ee7c7d5c289e9/integration-test/spark3/src/test/java/org/apache/uniffle/test/AQERepartitionTest.java#L80

I will remove it.

Most of the test cases keep it, I guess it's to wait for the shuffle server to be ready, do we still need to remove it?

You can left it as it is for now. However it will introduce a lot of unnecessary sleep time for integration tests, we should address it in a followup PRs.

advancedxy · 2024-06-25T02:40:57Z

client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkConfig.java

+      ConfigOptions.key("rss.client.mapSideCombine.enabled")
+          .booleanType()
+          .defaultValue(false)
+          .withDescription("Whether to enable map side combine of shuffle writer.");


Could you also update the user documentation about this config?

I believe the reason we didn't perform map side combine when writing is that the map side could be super lightweight and improve write throughput. The documentation should state that implication as well.

advancedxy · 2024-06-25T02:58:38Z

integration-test/spark3/src/test/java/org/apache/uniffle/test/MapSideCombineTest.java

+    sc.parallelize(data, 10).mapToPair(x -> new Tuple2<>(x % 10, 1)).reduceByKey((x, y) -> x + y)
+        .collect().stream()
+        .forEach(x -> result.put(x._1 + "-result-value", x._2));


One more thing, could we add more test cases to demonstrate the map-side combine.

This test uses map side combine by default in vanilla spark implementation. We checked that rss behaves same as vanilla spark.

I mean more RDD calls that require map side combine other than reduceByKey.

I mean more RDD calls that require map side combine other than reduceByKey.

Got it, I will add tests for other operators

wForget · 2024-06-25T03:36:19Z

integration-test/spark3/src/test/java/org/apache/uniffle/test/MapSideCombineTest.java

+    }
+
+    // check map side combine
+    assertEquals(100L, result.get("0-write-records"));


When map side combine is in effect, the shuffle write records of stage 0 is 100, otherwise it is 1000.

advancedxy

Almost lgtm now, it would be great if we can update the documentation as well.

rickyma · 2024-06-25T17:30:49Z

Would you mind adding the new config into the markdown documentation?

wForget · 2024-06-26T01:57:50Z

@advancedxy @rickyma Thank you for your review , I will add doc later.

advancedxy

LGTM, thanks for your contribution.

Merging this as there's no objection for now. Others are welcome to leave comments if any, they will be addressed in follow-up PRs.

…apache#1825) Support map side combine of shuffle write Fix: apache#1824 Yes, support new shuffle writer behavior. Added integration test

wForget added 2 commits June 24, 2024 18:46

Support map side combine of shuffle writer

5fbff05

fix spotless

de19f24

rickyma requested review from advancedxy, zuston and EnricoMi June 24, 2024 12:10

revert

931ba2d

zuston reviewed Jun 25, 2024

View reviewed changes

zuston changed the title ~~[#1824] Support map side combine of shuffle writer~~ [#1824] feat(spark): Support map side combine of shuffle writer Jun 25, 2024

advancedxy reviewed Jun 25, 2024

View reviewed changes

assert map side combine enabled

85332ce

wForget commented Jun 25, 2024

View reviewed changes

add more tests

e9b3459

advancedxy reviewed Jun 25, 2024

View reviewed changes

add doc

d48ba34

advancedxy approved these changes Jun 26, 2024

View reviewed changes

advancedxy merged commit e0996f2 into apache:master Jun 26, 2024
43 checks passed

[#1824] feat(spark): Support map side combine of shuffle writer #1825

[#1824] feat(spark): Support map side combine of shuffle writer #1825

Uh oh!

Conversation

wForget commented Jun 24, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Jun 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wForget Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

advancedxy Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

advancedxy left a comment

Choose a reason for hiding this comment

Uh oh!

rickyma commented Jun 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wForget commented Jun 26, 2024

Uh oh!

advancedxy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jun 24, 2024 •

edited

Loading

wForget Jun 25, 2024 •

edited

Loading

advancedxy Jun 25, 2024 •

edited

Loading

rickyma commented Jun 25, 2024 •

edited

Loading