-
Notifications
You must be signed in to change notification settings - Fork 242
Fix global lock contention in DistClusterControllerStateModel caused by Optional.empty() singleton
#3052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix global lock contention in DistClusterControllerStateModel caused by Optional.empty() singleton
#3052
Conversation
…by Optional.empty() singleton
DistClusterControllerStateModel caused by Optional.empty() singleton
|
@junkaixue @GrantPSpencer Can you take a look at this? Thanks! |
helix-core/src/main/java/org/apache/helix/participant/DistClusterControllerStateModel.java
Outdated
Show resolved
Hide resolved
GrantPSpencer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great catch on this, will greatly reduce topstate handoff when controller is processing multiple STs simultaneously
Just have 1 nit comment on test assertion
helix-core/src/test/java/org/apache/helix/participant/TestDistControllerStateModel.java
Outdated
Show resolved
Hide resolved
GrantPSpencer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
will still need committer approval, cc @junkaixue @xyuanlu
| logger.info(controllerName + " becoming leader from standby for " + clusterName); | ||
|
|
||
| synchronized (_controllerOpt) { | ||
| synchronized (_controllerLock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Optional.empty() is the root cause of locking problem, why not just make the function synchronized without creating new lock object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate the suggestion @junkaixue, that’s a fair point. I agree that synchronized at method level on this would technically provide mutual exclusion in this case.
I opted for a dedicated lock object (_controllerLock) to keep the synchronization scope narrowly focused on controller lifecycle transitions, and also I didnt want to change the already implemented dedicated lock logic. This makes the locking intent more explicit and avoids potential contention if other unrelated synchronized methods are ever added to this class (either now or later, or lets say by mistake).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you think, if we are sure this class will be small in size in future and not much complex, then yeah synchronized methods make more sense then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@junkaixue : a new lock based implementation is close to the existing implementation while fixing the problem on the lock contention issue found as mentioned in the issue (due to JDK implementation of Optional.empty as singleton).
imho, making the methods synchronized may be of larger scope as we need to relook at the thread safety of other instance variables in current class and its base classes. So, if you suggest to fix this, should this be addressed in a different issue and PR.
|
Hey @junkaixue can you take a look at this again if you get time, Thanks a lot! |
|
Hey @junkaixue , @xyuanlu reminder on this one, Thanks for your time! |
Issues
This PR fixes _controllerOpt getting shared across resources #3050 : _controllerOpt getting shared across resources
Description
Refer to _controllerOpt getting shared across resources #3050 for Problem description.
Basically, all DistClusterControllerStateModel instances were synchronizing on the same global lock object due to Optional.empty() returning a singleton instance.
Replaced synchronization on _controllerOpt (the shared singleton) with a dedicated per-instance lock object _controllerLock = new Object(). This eliminates cross-instance contention while preserving within-instance synchronization guarantees.
(Write a concise description including what, why, how)
Tests
The following tests are written for this issue:
Added a new test testNoSharedLockAcrossInstances
(List the names of added unit/integration tests)
The following is the result of the "mvn test" command on the appropriate module:
(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)
Changes that Break Backward Compatibility (Optional)
(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)
Documentation (Optional)
(Link the GitHub wiki you added)
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)