Skip to content

[ZEPPELIN-6256] Fix resource leaks in SparkInterpreterLauncher.detectSparkScalaVersion #4996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

renechoi
Copy link
Contributor

What is this PR for?

This PR fixes resource leaks in the detectSparkScalaVersion method of SparkInterpreterLauncher.java:

  1. Unclosed FileInputStream: The FileInputStream created to read the process output was never closed, leading to file descriptor leaks
  2. Undeleted temporary file: The temporary file created to capture spark-submit output was never deleted, causing disk space accumulation

These leaks could destabilize Zeppelin over time, especially with frequent interpreter restarts.

What type of PR is it?

Bug Fix

Todos

  • - Code review
  • - CI build verification

What is the Jira issue?

How should this be tested?

  • Unit tests added:

    • testDetectSparkScalaVersionResourceCleanup: Verifies that FileInputStream is closed and temporary file is deleted after successful execution
    • testDetectSparkScalaVersionMultipleCalls: Ensures no resource accumulation after multiple method invocations
  • Manual testing:

    • Start Zeppelin with Spark interpreter
    • Monitor temp directory before/after interpreter restart
    • Verify no zeppelin-spark*.out files remain
  • CI: All existing tests pass

Screenshots (if appropriate)

N/A

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

Implementation Details

  • Used try-with-resources to ensure FileInputStream is properly closed
  • Added finally block to guarantee temporary file deletion
  • Added warning log when file deletion fails for debugging
  • Maintained backward compatibility - no API changes

Code Changes

  • Modified detectSparkScalaVersion method to properly manage resources
  • Added comprehensive unit tests for resource cleanup verification

renechoi and others added 5 commits July 16, 2025 00:03
- Add comprehensive input validation for Repository ID, URL, credentials, and proxy settings
- Introduce RepositoryException for better error handling and clear error messages
- Enhance JSON deserialization with proper exception handling and validation
- Add extensive test coverage with 21 test cases covering all validation scenarios
- Improve JavaDoc documentation with usage examples and exception conditions
- Maintain backward compatibility while strengthening internal validation
…itory class

- Made all Repository class variables final for immutability
- Implemented Builder pattern as requested by reviewer
- Deprecated existing fluent API methods in favor of Builder
- Updated factory methods to use Builder pattern
- Maintained backward compatibility with @deprecated annotations
- Updated javadoc examples to demonstrate Builder usage

The Builder pattern provides better control over object construction
and ensures immutability of Repository instances once created.
- Updated all Repository instantiation in tests to use the new builder pattern
- Fixed compilation errors in RepositoryTest.java and InterpreterInfoSavingTest.java
- Replaced direct constructor calls with Repository.Builder pattern
- All tests now use .build() method to create Repository instances
- Maintained test logic and coverage while adopting the new pattern
@renechoi renechoi closed this Jul 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant