-
Notifications
You must be signed in to change notification settings - Fork 754
Enable bigCallee size adjustment during ECS for JDK8 and 11 #21978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The IBM JDK8 regression from WAS perf was significant in some cases, in the 5-7% throughput range. If a formal issue has not been opened by them, it could be (if that makes the decision to double deliver any easier). I would say we should consider double delivering it, since it is a fix for a performance regression, but will also wait for @vij-singh and @pshipton opinions. |
It's past DCUT for 0.53, only bug fixes are allowed now. |
"This is a performance bug" was what my comment was saying |
Given this is to fix a significant performance bug (regression) that has been hit by WAS Perf, and that the change has already been in place for Java >11, I vote for this to be allowed into 0.53 and 8.0.8.50. |
I have observed a fairly reproducible intermittent failure using a JDK8 Windows build with this change. The failure output suggests that it may be related to OpenJ9 MethodHandles. I am investigating the cause of that failure. We may need to restrict ECS along OpenJ9 MethodHandle thunks due to the various functional issues that keep showing up with changes to inliner behaviour. Failure output:
|
9ca3233
to
0200798
Compare
@vijaysun-omr I have a workaround in place to avoid running into functional issues with OpenJ9 MethodHandle inlining behavioural changes we get from the big callees adjustment. Requesting review. |
@@ -1817,7 +1817,11 @@ TR_J9EstimateCodeSize::realEstimateCodeSize(TR_CallTarget *calltarget, TR_CallSt | |||
int32_t origRealSize = _realSize; | |||
int32_t origBigCalleesSize = _bigCalleesSize; | |||
bool prevNonColdCalls = _hasNonColdCalls; | |||
bool estimateSuccess = estimateCodeSize(targetCallee, &callStack, /* recurseDown */ true, analyzedSizeThreshold); | |||
bool estimateSuccess = false; | |||
// Terminate recursive ECS along archetypes due to functional issues resulting from continuing ECS in Java 8 and 11. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the face of it, this seems like we won't inline past an archetype call, which could risk regressions on MH heavy workloads. I know this change has worked well in a non-MH heavy workload (Liberty) but have you tested this on MH heavy benchmarks ? If it has not regressed there, is there an intuitive reason for why ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this on MethodHandle-heavy workloads, and did not observe any regression when comparing just the impact of this bit of change, on top of having big callee size adjustment enabled in both builds. Even with ECS getting terminated upon hitting a MethodHandle thunk archetype, we would still inline them during targeted inlining that happens later as part of methodHandleInvokeInliningGroup
. As part of inlineCallTarget2
, we would perform checks for the existence of the MH thunks and enable targeted inlining that would handle MH thunks. This is only enabled for OpenJ9 MethodHandle builds.
openj9/runtime/compiler/optimizer/InlinerTempForJ9.cpp
Lines 507 to 517 in c350fa9
TR_J9InlinerUtil::requestAdditionalOptimizations(TR_CallTarget *calltarget) | |
{ | |
#if !defined(J9VM_OPT_OPENJDK_METHODHANDLE) | |
if (calltarget->_myCallSite->getDepth() == -1 // only do this for top level callee to prevent exponential walk of inlined trees | |
&& checkForRemainingInlineableJSR292(comp(), calltarget->_calleeSymbol)) | |
{ | |
_inliner->getOptimizer()->setRequestOptimization(OMR::methodHandleInvokeInliningGroup); | |
if (comp()->trace(OMR::inlining)) | |
heuristicTrace(tracer(),"Requesting one more pass of targeted inlining due to method handle invoke in %s\n", tracer()->traceSignature(calltarget->_calleeSymbol)); | |
} | |
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, maybe that is why we do not see any performance regression on those workloads using the old MH implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Furthermore, inlining along MH thunks would also continue through late inlining triggered by Global ValuePropagation. That has been my observation in some small MH operations I tested. VP handles inlining along all MH thunks for baseline build (current main branch), and the 2 builds I mentioned in the last comment.
Jenkins test sanity all jdk8,jdk11 |
JDK8 32-bit Windows failure is due to a warning being treated as error. Not related to the changes in this PR.
|
Another error seen in the 32 bit windows build:
|
JDK11 sanity.functional failure on ppc64le linux:
It is a known failure: #18599 x86_64 linux JDK8 sanity.functional test failure:
there are many such instances across several tests. It is a known problem: #14594 |
X86_64 JDK8 Windows sanity.openjdk test failure looks similar to the test failure I have seen in my personal builds, which seemed to have gone away after restricting ECS along MH thunks. With this failure showing up here, I am not sure that second change was an effective workaround.
This failure is also being reported in #22001, with a very similar reported backtrace, in a JDK version that does not even use MH thunks. In light of this, I think we need to understand what is going wrong during the compilation of |
Assigning to @hzongaro for committing and his awareness since Nazim may run out of time in debugging this. |
During EstimateCodeSize, if we came across a callee method that was deemed too big, and therefore, set to be discarded as an inlining candidate, a size adjustment for other methods above such methods is necessary to ensure the entire call graph does not get marked as too big. For JDK8 and 11, this adjustment was disabled due to a functional issue that was exposed due to the different inlining outcome from this adjustment. This change removes the restriction that was in place as a workaround, and fixes performance issues resulting from early ECS termination in JDK8 and 11. Signed-off-by: Nazim Bhuiyan <[email protected]>
0200798
to
0f4d58a
Compare
With big callees adjustment in ECS enabled for JDK8/11, the frequency of the failure is about 3/5 iterations running |
During EstimateCodeSize, if we came across a callee method that was deemed too big, and therefore, set to be discarded as an inlining candidate, a size adjustment for other methods above such methods is necessary to ensure the entire call graph does not get marked as too big. For JDK8 and 11, this adjustment was disabled due to a functional issue that was exposed due to the different inlining outcome from this adjustment. This change removes the restriction that was in place as a workaround, and fixes performance issues resulting from early ECS termination in JDK8 and 11.
The restriction placed to avoid any functional issues until JDK8 and 11 builds switch to using OpenJDK MethodHandle implementation is to terminate recursive ECS once we reach a MethodHandle thunk archetype.
With the initial attempt to have this be in effect across all Java versions, we ran into test failures across several platforms with JDK8 and 11, detailed in #21397. Test failures reported in #21397 are no longer reproducible with the current set of inliner changes that have been contributed over the past few months. Attempting to reproduce those test failures with the change in this PR has been unsuccessful in 60X Grinder on Aarch64 Linux (JDK11). Furthermore, no failures were seen in a 100X Grinder job on X86_64 Linux (JDK11), in which over half of the test iterations have completed as of now.
Recently, the WAS perf team observed a regression using IBM Java 8 caused by the ECS issue that this PR is going to address. In addition to fixing the perf regression, an additional 2.8% performance gain was observed compared to the pre-regression baseline build. Given the significant perf impact this change has, I would like to propose that this change is double-delivered for the 0.53 release branch. @vijaysun-omr @vij-singh @pshipton Please let me know your thoughts on whether we have enough justification to double deliver this.
@vijaysun-omr Also, requesting your review of this change.