Skip to content

Conversation

@LZD-PratyushBhatt
Copy link
Collaborator

@LZD-PratyushBhatt LZD-PratyushBhatt commented Jul 14, 2025

Issues

Description

Refer to apache#3050 for Problem description.
Basically, all DistClusterControllerStateModel instances were synchronizing on the same global lock object due to Optional.empty() returning a singleton instance.
Replaced synchronization on _controllerOpt (the shared singleton) with a dedicated per-instance lock object _controllerLock = new Object(). This eliminates cross-instance contention while preserving within-instance synchronization guarantees.
(Write a concise description including what, why, how)

Tests

  • The following tests are written for this issue:
    Added a new test testNoSharedLockAcrossInstances
    (List the names of added unit/integration tests)

  • The following is the result of the "mvn test" command on the appropriate module:

(If CI test fails due to known issue, please specify the issue and test PR locally. Then copy & paste the result of "mvn test" to here.)

Changes that Break Backward Compatibility (Optional)

  • My PR contains changes that break backward compatibility or previous assumptions for certain methods or API. They include:

(Consider including all behavior changes for public methods or API. Also include these changes in merge description so that other developers are aware of these changes. This allows them to make relevant code changes in feature branches accounting for the new method/API behavior.)

Documentation (Optional)

  • In case of new functionality, my PR adds documentation in the following wiki page:

(Link the GitHub wiki you added)

Commits

  • My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Code Quality

  • My diff has been formatted using helix-style.xml
    (helix-style-intellij.xml if IntelliJ IDE is used)

@proud-parselmouth
Copy link
Collaborator

Nit: Add the reference to the apache helix PR. Would be easier to reference any comments there.

Copy link
Collaborator

@proud-parselmouth proud-parselmouth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks for this.

@LZD-PratyushBhatt
Copy link
Collaborator Author

Test failure is unrelated.

[info] ./helix-core/target/surefire-reports/TestSuite.txt: Tests run: 1466, Failures: 1, Errors: 0, Skipped: 3, Time elapsed: 6,694.226 s <<< FAILURE! - in TestSuite
Error:  Test failed: testDistributedController(org.apache.helix.integration.manager.TestConsecutiveZkSessionExpiry)  Time elapsed: 34.064 s  <<< FAILURE!

@LZD-PratyushBhatt LZD-PratyushBhatt merged commit 558f1aa into master Jul 14, 2025
2 of 3 checks passed
@proud-parselmouth
Copy link
Collaborator

proud-parselmouth commented Jul 15, 2025

You can link the apache helix flaky test issue with your comment.

apache#2882

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_controllerOpt getting shared across resources

3 participants