Skip to content

[ZEPPELIN-6259] Add null safety checks in SparkInterpreterLauncher.detectSparkScalaVersionByReplClass #4999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

renechoi
Copy link
Contributor

What is this PR for?

This PR adds null safety checks to the detectSparkScalaVersionByReplClass method in SparkInterpreterLauncher.java to prevent NullPointerException and provide clear error messages when the Spark jars directory is inaccessible.

Current Issues Fixed:

  1. NullPointerException Risk: listFiles() returns null when directory doesn't exist or is inaccessible
  2. Poor Error Messages: Users get cryptic NPE instead of meaningful error messages
  3. Missing Validation: No checks for directory existence or type

What type of PR is it?

Bug Fix / Improvement

Todos

  • - Code review
  • - CI build verification

What is the Jira issue?

How should this be tested?

  • Unit tests added:

    • testDetectSparkScalaVersionByReplClassWithNonExistentDirectory - Verifies error when directory doesn't exist
    • testDetectSparkScalaVersionByReplClassWithFileInsteadOfDirectory - Verifies error when path is a file
    • testDetectSparkScalaVersionByReplClassWithValidDirectory - Verifies normal operation works
    • testDetectSparkScalaVersionByReplClassWithEmptyDirectory - Verifies error when no spark-repl jars found
  • Manual testing:

    • Set invalid SPARK_HOME and verify clear error messages
    • Remove read permissions on SPARK_HOME/jars and verify permission error
  • CI: All existing tests pass

Screenshots (if appropriate)

N/A

Questions:

  • Does the license files need to update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

Implementation Details

  • Added directory existence check with clear error message
  • Added directory type validation (ensures it's not a file)
  • Added null check for listFiles() result with permission hint
  • All error messages include the problematic path for easier debugging
  • Used IOException for file system related errors (consistent with Java conventions)

Error Message Examples

Before (NPE):
java.lang.NullPointerException
at java.util.stream.Stream.of(Stream.java:1012)

After (Clear messages):
java.io.IOException: Spark jars directory does not exist: /invalid/path/jars. Please check your SPARK_HOME setting.
java.io.IOException: Spark jars path is not a directory: /some/file/jars
java.io.IOException: Cannot access Spark jars directory: /restricted/jars. Please check permissions.

Benefits

  1. Better User Experience: Clear error messages help users quickly identify and fix configuration issues
  2. Defensive Programming: Prevents crashes from null pointer exceptions
  3. Easier Debugging: Specific error messages with paths make troubleshooting straightforward
  4. Production Ready: Handles edge cases that can occur in various deployment environments

@@ -293,7 +293,25 @@ private String detectSparkScalaVersion(String sparkHome, Map<String, String> env

private String detectSparkScalaVersionByReplClass(String sparkHome) throws Exception {
File sparkJarsFolder = new File(sparkHome + "/jars");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is time to use the Files and Paths classes here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants