-
Notifications
You must be signed in to change notification settings - Fork 161
[#1818] fix(spark3): Avoid calling RssShuffleDataIterator.cleanup multiple times #1819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Have you tested it? When will we encounter this issue? Could you add some UTs for this? |
It happened by chance when I was running integration test for spark3 locally. |
new Function0<BoxedUnit>() { | ||
private boolean cleanupCalled = false; | ||
|
||
@Override | ||
public BoxedUnit apply() { | ||
if (!cleanupCalled) { | ||
cleanupCalled = true; | ||
context.taskMetrics().mergeShuffleReadMetrics(); | ||
iterator.cleanup(); | ||
} | ||
return BoxedUnit.UNIT; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. It's possible that the completion would be called twice:
- called after being consumed all
- invoked in the
TaskCompletionListener
again in L284 - L290.
So this code looks correct to me.
However, it's quite ugly and unreadable to add an anonymous class here directly. Would you mind to extract this as a utility method, something like:
import scala.Function0;
public class Once {
private Once() {
}
public static <R> Function0<R> once(Function0<R> f) {
return new Function0<R>() {
private R value = null;
private volatile boolean computed = false;
@Override
public R apply() {
if (!computed) {
computed = true;
value = f.apply();
}
return value;
}
};
}
}
And call Once.once
here?
Maybe we can place Also, there are tests failures. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1819 +/- ##
============================================
- Coverage 53.53% 53.52% -0.02%
- Complexity 2356 2979 +623
============================================
Files 368 439 +71
Lines 16852 24158 +7306
Branches 1540 2258 +718
============================================
+ Hits 9022 12930 +3908
- Misses 7303 10432 +3129
- Partials 527 796 +269 ☔ View full report in Codecov by Sentry. |
It looks like there is scala compatibility issue and we can't put it in common, so I've moved it to spark3 client. |
Maybe we can just use pure Java codes, something like:
Then we can avoid the Scala compatibility issue. WDYT? |
|
Then I have no more comments, let @advancedxy take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's wait some more time before @zuston can take a look since the corresponding code is added by him.
We can merge it by late tonight if no objections.
public class FunctionUtilsTests { | ||
|
||
@Test | ||
public void testOnceFunction0() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Thanks for your contribution. Curious to ask, what kind of compatibility issue arise when putting in common? |
You can see more details from this failed GA: https://github.com/apache/incubator-uniffle/actions/runs/9640163249/job/26583457745
|
hmmm, I see, thanks for the info. I think the problem is caused by mixed usage of Scala 2.11(from spark 2.x) and Scala 2.12. In my opinion, we should drop Spark 2.x support at some point, it's introducing more and more maintenance cost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Merged. Thanks @wForget for your contribution and thanks @rickyma @advancedxy for your review |
What changes were proposed in this pull request?
Avoid calling
RssShuffleDataIterator.cleanup
multiple times.Why are the changes needed?
Fix: #1818
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing UTs.