Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49479][CORE] Cancel the Timer non-daemon thread on stopping the BarrierCoordinator #49934

Open
wants to merge 1 commit into
base: branch-3.5
Choose a base branch
from

Conversation

bcheena
Copy link

@bcheena bcheena commented Feb 13, 2025

What changes were proposed in this pull request?

Cancelling the Timer non-daemon thread on stopping the BarrierCoordinator

Why are the changes needed?

In Barrier Execution Mode, Spark driver JVM could hang around after calling spark.stop(). Although the Spark Context was shutdown, the JVM was still running. The reason was that there is a non-daemon timer thread named BarrierCoordinator barrier epoch increment timer, which prevented the driver JVM from stopping.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Run the following scripts locally using s./bin/spark-submit. Without this change, the JVM would hang there and not exit. With this change it would exit successfully.

  1. barrier_example.py
  2. xgboost-test.py.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Feb 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant