Skip to content

[ZEPPELIN-6258] Improve Process resource management in SparkInterpreterLauncher #4998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renechoi
Copy link
Contributor

What is this PR for?

This PR improves process resource management in the detectSparkScalaVersion method of SparkInterpreterLauncher.java to prevent process hangs and resource leaks.

Current Issues Fixed:

  1. Buffer Overflow Risk: Process stdout was not consumed, potentially causing hangs if spark-submit produces output
  2. Indefinite Blocking: No timeout on process.waitFor() could cause indefinite blocking
  3. Missing Process Cleanup: Process wasn't explicitly destroyed after use
  4. No Exit Value Validation: Process exit status wasn't checked

What type of PR is it?

Improvement

Todos

  • - Code review
  • - CI build verification

What is the Jira issue?

How should this be tested?

  • Unit test added: testDetectSparkScalaVersionProcessManagement - Verifies the method works correctly with new process management

  • Manual testing:

    • Start Zeppelin with Spark interpreter
    • Verify spark-submit processes are properly terminated
    • No zombie processes should remain after interpreter initialization
  • CI: All existing tests pass

Screenshots (if appropriate)

N/A

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

Implementation Details

  • Added stdout consumption using IOUtils.copy() to NullOutputStream to prevent buffer overflow
  • Implemented 30-second timeout with process.waitFor(30, TimeUnit.SECONDS)
  • Added forceful process termination on timeout
  • Added exit value logging for better diagnostics
  • Ensured process cleanup in finally block with destroyForcibly()
  • Moved file reading into try-finally block for better resource management

Benefits

  1. Prevents Hangs: Stdout consumption prevents buffer-related process hangs
  2. Guaranteed Termination: 30-second timeout ensures process won't block indefinitely
  3. Better Diagnostics: Exit value logging helps identify spark-submit failures
  4. Resource Safety: Explicit process cleanup prevents zombie processes

…erLauncher

- Added stdout consumption to prevent buffer overflow
- Implemented process timeout (30 seconds) with forceful termination
- Added exit value validation and logging
- Ensured process cleanup in finally block
- Maintained backward compatibility

This prevents process hangs and ensures proper resource cleanup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant