Skip to content

[ZEPPELIN-6245] Improve Repository class stability and validation #4977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 1, 2025

Conversation

renechoi
Copy link
Contributor

@renechoi renechoi commented Jul 15, 2025

Improve Repository Class Stability and Validation

Summary

This PR enhances the Repository class with comprehensive input validation, error handling, and stability improvements to ensure robust Maven repository configuration management.

Changes Made

1. Enhanced Input Validation

  • Repository ID validation: Added regex pattern validation to ensure IDs contain only alphanumeric characters, dots, underscores, and hyphens
  • URL validation: Implemented URL format validation supporting HTTP, HTTPS, and FILE protocols
  • Credentials validation: Added validation to ensure both username and password are provided together
  • Proxy validation: Added comprehensive proxy configuration validation including protocol, host, and port checks

2. Improved Error Handling

  • Custom Exception: Created RepositoryException class for Repository-specific errors
  • JSON parsing: Enhanced error handling for malformed JSON with descriptive error messages
  • Validation errors: Comprehensive error messages for all validation failures

3. Builder Pattern Enhancements

  • Method chaining: Improved builder pattern with proper validation at each step
  • Fluent API: Enhanced usability with intuitive method names and return types

4. JSON Serialization Improvements

  • Robust parsing: Enhanced fromJson() method with comprehensive validation
  • Backward compatibility: Maintained compatibility with existing JSON formats
  • Error resilience: Graceful handling of invalid JSON inputs

5. Documentation and Code Quality

  • Comprehensive JavaDoc: Added detailed documentation with examples and parameter descriptions
  • Usage examples: Included code examples in class-level documentation
  • Validation documentation: Clear documentation of all validation rules

Benefits

  1. Improved Reliability: Input validation prevents runtime errors from malformed configurations
  2. Better Error Messages: Clear, actionable error messages help users identify and fix configuration issues
  3. Enhanced Security: URL and credential validation prevents potential security issues
  4. Maintainability: Clean, well-documented code with comprehensive test coverage
  5. User Experience: Better error handling and validation feedback

Testing

  • All existing tests pass
  • New comprehensive test suite covers edge cases and validation scenarios
  • Integration tests verify backward compatibility
  • Manual testing confirms REST API functionality

Example Usage

// Basic repository configuration
Repository repo = new Repository("central")
    .url("https://repo.maven.apache.org/maven2/");

// Repository with authentication
Repository privateRepo = new Repository("private")
    .url("https://private.repo/maven2/")
    .credentials("username", "password");

// Repository with proxy configuration
Repository proxyRepo = new Repository("proxy-repo")
    .url("https://repo.example.com/maven2/")
    .proxy("HTTP", "proxy.host", 8080, "proxyUser", "proxyPass");

- Add comprehensive input validation for Repository ID, URL, credentials, and proxy settings
- Introduce RepositoryException for better error handling and clear error messages
- Enhance JSON deserialization with proper exception handling and validation
- Add extensive test coverage with 21 test cases covering all validation scenarios
- Improve JavaDoc documentation with usage examples and exception conditions
- Maintain backward compatibility while strengthening internal validation
@Reamer
Copy link
Contributor

Reamer commented Jul 16, 2025

Looked at the changes. Looks pretty good. I have not yet lighted all the corners. Can you make it so that the repository class variables are final?
An extension of the builder pattern could be helpful here, where a build() is called at the end.

btw. the changes are definitely not ‘minor’. Please create a JIRA ticket.

…itory class

- Made all Repository class variables final for immutability
- Implemented Builder pattern as requested by reviewer
- Deprecated existing fluent API methods in favor of Builder
- Updated factory methods to use Builder pattern
- Maintained backward compatibility with @deprecated annotations
- Updated javadoc examples to demonstrate Builder usage

The Builder pattern provides better control over object construction
and ensures immutability of Repository instances once created.
@renechoi renechoi changed the title [MINOR] Improve Repository class stability and validation [ZEPPELIN-6245] Improve Repository class stability and validation Jul 16, 2025
@renechoi
Copy link
Contributor Author

@Reamer Thank you for the review!

I've created a JIRA ticket as suggested: https://issues.apache.org/jira/browse/ZEPPELIN-6245

Regarding your suggestions:

  1. Final variables: I've updated the Repository class to make the instance variables final. This will indeed improve immutability and thread safety.
  2. Builder pattern with build(): Great suggestion! I did refactor to implement a proper builder pattern with a build() method that performs final validation and returns an immutable Repository instance.

@Reamer
Copy link
Contributor

Reamer commented Jul 17, 2025

It looks much better with the builder pattern. 👍
Please remove the methods that you have marked with @deprecated. We will only merge the feature into the master and do not need backward compatibility.

I'm not entirely satisfied with the changes yet, but it's a good first attempt.

In my opinion, we should not use classes from the org.eclipse.aether package in the other Zeppelin modules (zeppelin-zengine, zeppelin-server).

I don't know if this is even feasible, which is why I would merge these changes first and then look into this issue.
Based on initial findings, everything hinges on the method public List<RemoteRepository> getRepositories() in the class InterpreterSettingManager.
However, the UI, which needs to display the JSON object correctly, could also be involved here.

@ankursaini2006
Copy link

Hi @renechoi

First of all thank you for the changes you made.

Recently I raised a ticket ZEPPELIN-6231 for highlighting a bug that I found while testing Zeppelin on Kubernetes.

After reviewing the code changes you made it appears that the bug will be fixed.

Thanks
Ankur

@renechoi
Copy link
Contributor Author

@Reamer Thank you for the feedback! I've removed all the deprecated methods as requested.

Regarding the org.eclipse.aether package usage, I investigated the codebase and found that:

  1. The Repository class serves as a bridge between Zeppelin's serializable repository configuration and Maven Aether's RemoteRepository
  2. Key usage points are:
    • InterpreterSettingManager: Uses getRepositories() method to convert Repository to RemoteRepository for dependency resolution
    • InterpreterRestApi: Exposes repository information through REST endpoints
    • The conversion methods (toRemoteRepository() and fromRemoteRepository()) are the main integration points

The current design allows Zeppelin to maintain a clean separation where:

  • Internal repository configuration uses our Repository class (JSON serializable)
  • Actual Maven dependency resolution uses Aether's RemoteRepository

I agree that reducing the exposure of org.eclipse.aether classes would improve the architecture. As you suggested, this could be a good follow-up improvement after this PR is merged.

@ankursaini2006 Thank you for confirming that these changes should fix the issue you reported in ZEPPELIN-6231! It's great to hear that this PR will help resolve the Kubernetes deployment issue.

@ankursaini2006
Copy link

@renechoi since you are refactoring the dependencies management, I have one question. if we request an artifact from maven repository, currently whole dependency tree is downloaded and eventually the requested dependency is downloaded. It takes a lot of time. During my test for downloading trino jdbc jar, it took 20mins. Why don’t we just download the requested artifact itself ? Instead of doing collect dependencies why don’t we perform resolve artifact ?

source: resolveDependencies

Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your current code does not compile. Please use the builder in your tests as well.

@Reamer Reamer self-assigned this Jul 21, 2025
- Updated all Repository instantiation in tests to use the new builder pattern
- Fixed compilation errors in RepositoryTest.java and InterpreterInfoSavingTest.java
- Replaced direct constructor calls with Repository.Builder pattern
- All tests now use .build() method to create Repository instances
- Maintained test logic and coverage while adopting the new pattern
@renechoi
Copy link
Contributor Author

@ankursaini2006 Thank you for your insights!

Regarding ZEPPELIN-6231: Great to hear that these Repository improvements should fix the Kubernetes deployment issue you reported.

About the dependency resolution performance issue you raised - this is indeed a critical performance problem. I investigated the code and confirmed that DependencyResolver.getArtifactsWithDep() always downloads the entire transitive dependency tree using system.resolveDependencies(). Your experience with 20 minutes for downloading trino jdbc jar is a serious concern.

Your suggestion to use resolveArtifact() instead of resolveDependencies() makes sense for cases where only the main artifact is needed. However, this would require careful implementation to avoid runtime ClassNotFoundException issues.

@Reamer What do you think about creating a follow-up JIRA ticket for this dependency resolution optimization? We could:

  1. Add a configuration option to control transitive dependency resolution
  2. Implement a resolveWithoutTransitive flag in DependencyResolver
  3. Allow users to choose between full dependency tree vs. single artifact download

This would be a significant performance improvement while maintaining backward compatibility. Should we track this as a separate enhancement after this PR is merged?

Ready for your review!

@Reamer
Copy link
Contributor

Reamer commented Jul 24, 2025

I don't think that's a good idea. If you don't need the transitive dependencies, you can exclude them just like you would in Maven.

@Reamer
Copy link
Contributor

Reamer commented Jul 24, 2025

Currently CI fails because of your changes.

Error:  Failures: 
Error:    InterpreterSettingManagerTest.testInitInterpreterSettingManager:85 expected: <2> but was: <3>

@renechoi
Copy link
Contributor Author

I've fixed the CI failure.

The issue was in InterpreterSettingManager.loadFromFile() where repositories were being added without checking for duplicates, causing the test to fail (expected 2 repositories but got 3).

Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will merge this into the master on Monday if no further comments are received.

Copy link
Contributor

@Reamer Reamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Reamer
Copy link
Contributor

Reamer commented Jul 28, 2025

I will merge the pull request on Wednesday if no further comments are received.

@renechoi
Copy link
Contributor Author

@Reamer Hi, just wanted to gently check in on this PR. Please let me know if there is anything else needed from my side for the merge. Thanks again for your time and reviews!

@Reamer Reamer merged commit d022d62 into apache:master Aug 1, 2025
16 of 18 checks passed
@Reamer
Copy link
Contributor

Reamer commented Aug 1, 2025

@renechoi
I've just been a little busy with other things over the last few days.
Thank you very much for your contribution. Merged into master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants