Skip to content

[ZEPPELIN-6256] Fix resource leaks in SparkInterpreterLauncher.detectSparkScalaVersion #5000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renechoi
Copy link
Contributor

What is this PR for?

This PR fixes resource leaks in the detectSparkScalaVersion method of SparkInterpreterLauncher.java:

  1. Unclosed FileInputStream: The FileInputStream created to read the process output was never closed, leading to file descriptor leaks
  2. Undeleted temporary file: The temporary file created to capture spark-submit output was never deleted, causing disk space accumulation

These leaks could destabilize Zeppelin over time, especially with frequent interpreter restarts.

What type of PR is it?

Bug Fix

Todos

  • - Code review
  • - CI build verification

What is the Jira issue?

How should this be tested?

  • Unit tests added:

    • testDetectSparkScalaVersionResourceCleanup: Verifies that FileInputStream is closed and temporary file is deleted after successful execution
    • testDetectSparkScalaVersionMultipleCalls: Ensures no resource accumulation after multiple method invocations
  • Manual testing:

    • Start Zeppelin with Spark interpreter
    • Monitor temp directory before/after interpreter restart
    • Verify no zeppelin-spark*.out files remain
  • CI: All existing tests pass

Screenshots (if appropriate)

N/A

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

Implementation Details

  • Used try-with-resources to ensure FileInputStream is properly closed
  • Added finally block to guarantee temporary file deletion
  • Added warning log when file deletion fails for debugging
  • Maintained backward compatibility - no API changes

Code Changes

  • Modified detectSparkScalaVersion method to properly manage resources
  • Added comprehensive unit tests for resource cleanup verification

Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When working on the code here, we should use more modern language features.

It may also be possible to do without a temporary file if you capture the stream from the process.
String processOutput = IOUtils.toString(process.getErrorStream(), StandardCharsets.UTF_8);

process.waitFor();

String processOutput;
try (FileInputStream in = new FileInputStream(processOutputFile)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -272,22 +272,34 @@ private String detectSparkScalaVersion(String sparkHome, Map<String, String> env
builder.environment().putAll(env);
File processOutputFile = File.createTempFile("zeppelin-spark", ".out");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return detectSparkScalaVersionByReplClass(sparkHome);
}
} finally {
if (!processOutputFile.delete() && processOutputFile.exists()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deletion should be done with this function.
Files.deleteIfExists(...)
https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#deleteIfExists-java.nio.file.Path-

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants