-
Notifications
You must be signed in to change notification settings - Fork 2k
Add reflection-based auto-registration for ServerSidePlanningClientFactory #5907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add reflection-based auto-registration for ServerSidePlanningClientFactory #5907
Conversation
840b101 to
209ceeb
Compare
| } else { | ||
| // No factories discovered - delta-iceberg not on classpath | ||
| // This is fine, server-side planning just won't be available | ||
| // System.err.println("[Delta] No ServerSidePlanningClientFactory found, FGAC disabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should throw error (with relevant error messages) in all cases where its not exactly one factory configured (clearly something is wrong).
32e4a7b to
10e5dbc
Compare
…ctory Enable automatic discovery and registration of ServerSidePlanningClientFactory using reflection with Class.forName(). When delta-iceberg JAR is on the classpath, the IcebergRESTCatalogPlanningClientFactory is automatically loaded and registered. Uses lazy initialization with double-checked locking to maintain Spark Connect compatibility and avoid eager initialization issues. Changes: - Replace ServiceLoader with Class.forName() reflection in ServerSidePlanningClient.scala - Use proper DeltaLogging instead of System.err.println for warnings - Include serverSidePlanning package in delta-iceberg JAR (build.sbt) - Update JarSuite to verify serverSidePlanning classes are included in JAR
10e5dbc to
cf9c88b
Compare
Remove verbose documentation and section headers for cleaner code.
Simplify comment text and formatting for consistency.
Instead of silently logging warnings, throw IllegalStateException with clear, actionable error messages for each type of reflection failure: - NoSuchMethodException: Missing no-arg constructor (version mismatch) - InstantiationException: Class is abstract or corrupted JAR - IllegalAccessException: Access control issue with JAR - ClassCastException: Does not implement expected interface (version mismatch) - Generic Exception: Catch-all with error message Only ClassNotFoundException is silently ignored (delta-iceberg not on classpath).
Better reflects the reflection-based auto-registration approach rather than the old ServiceLoader implementation.
Replace multiple specific exception handlers with one unified error. Any failure to load the factory (including missing class) now throws IllegalStateException with clear guidance to check delta-iceberg JAR.
Break long lines to stay under 100 character limit.
- Remove trailing whitespace from 4 lines - Add scalastyle suppressions for Class.forName usage
We're not using any logging methods, only throwing exceptions.
| // e.g. org/apache/spark/sql/delta/icebergShaded/IcebergTransactionUtils.class | ||
| "org/apache/spark/sql/delta/icebergShaded/", | ||
| // Server-side planning support | ||
| "org/apache/spark/sql/delta/serverSidePlanning/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need this changes now since you switched to reflection based approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting some errors without this (java.lang.Exception: Prohibited jar classes found). here's what i think is happening: delta-iceberg has some sort of allowlist so it excludes classes by default (to prevent dupes vs delta-spark i guess). and i need to allow it in this suite so that it does not complain about a new class being present in the jar (due to including it in build.sbt) when it shouldn't .
.../src/main/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlanningClient.scala
Show resolved
Hide resolved
tdas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left comments.
weiluo-db
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 comments on the PR description:
- On the bottom right corner of the flow chart, it says "Found? No -> Silently skip". What does "silently skip" refer to?
- The description also mentions DeltaLogging but I don't see any references to that in the actual code.
.../src/main/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlanningClient.scala
Show resolved
Hide resolved
| // Use reflection to load the Iceberg factory class | ||
| // scalastyle:off classforname | ||
| val clazz = Class.forName( | ||
| "org.apache.spark.sql.delta.serverSidePlanning." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should this be a const in the object?
…stration Add unit tests for reflection-based auto-registration of IcebergRESTCatalogPlanningClientFactory: Test Coverage: - Auto-registration success when factory is on classpath - Factory caching (same instance across multiple getFactory calls) - Manual setFactory override behavior (before and after auto-registration) - clearFactory resets both registeredFactory and autoRegistrationAttempted flag - isFactoryRegistered helper method - getFactoryInfo helper method for debugging - autoRegistrationAttempted flag prevents redundant attempts - Test isolation verification Build changes: - Add iceberg classes directory to spark module test classpath to enable auto-registration testing without circular dependencies All 9 tests pass. No regressions in existing ServerSidePlannedTableSuite tests. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Refactor test suite following industry best practices and patterns used in Delta Lake codebase (SparkToIcebergExpressionConverterSuite, IcebergRESTCatalogPlanningClientSuite). Key improvements: - Add helper methods (withCleanFactory, assertFactoryType, assertNoFactory, assertSameInstance) to eliminate 60% code duplication - Consolidate 9 verbose tests into 5 concise table-driven tests using case classes for test scenarios - Reduce test method count by 44% (9 → 5) while maintaining same coverage - Add descriptive context to all assertions for better failure debugging Test structure: - factory state transitions: 4 scenarios (registration state checking) - factory registration scenarios: 3 scenarios (auto/manual registration) - clearFactory behavior: 3 scenarios (reset and re-registration) - getFactory caching: singleton instance verification - test isolation: verify clean state between tests All 17 tests in serverSidePlanning package pass. All scalastyle checks pass on prod and test code. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Simplify test code by removing over-engineered FactoryAction sealed trait and executeFactoryAction helper method. Use lambda functions directly in registrationSequence instead. Changes: - Remove sealed trait FactoryAction and case objects (TriggerAutoRegistration, SetManualFactory, ClearFactory) - Remove executeFactoryAction pattern matching method - Change FactoryRegistrationTestCase.registrationSequence from Seq[FactoryAction] to Seq[() => Unit] - Update test data to use lambda functions directly - Simplify test execution from .foreach(executeFactoryAction) to .foreach(_()) Result: 16 fewer lines of code (325 → 309) with same functionality and clarity. All 5 tests pass. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Reorganize tests to better align with module responsibilities and eliminate
circular dependency hack.
Changes:
1. spark module (ServerSidePlanningClientFactorySuite):
- Keep only factory pattern/mechanism tests (manual setFactory, clearFactory,
state management, test isolation)
- Remove all tests that depend on IcebergRESTCatalogPlanningClientFactory
- 9 tests focused on core factory functionality
2. iceberg module (new ServerSidePlanningClientFactoryAutoRegistrationSuite):
- Add tests for auto-registration with IcebergRESTCatalogPlanningClientFactory
- Test ServiceLoader-based discovery and registration
- Test interaction between auto-registration and manual override
- 7 tests focused on iceberg-specific auto-registration
3. build.sbt:
- Remove unmanagedClasspath hack that added iceberg classes to spark test classpath
- No longer needed since auto-registration tests moved to iceberg module
Benefits:
- Eliminates circular dependency workaround
- Better separation of concerns (pattern vs implementation)
- Tests what each module provides (spark = factory mechanism, iceberg = auto-registration)
- Natural dependency flow (iceberg already depends on spark)
All tests pass:
- spark: 9 tests (factory mechanism)
- iceberg: 7 tests (auto-registration)
- Total: 16 factory tests (same coverage as before)
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Remove 3 redundant test cases from ServerSidePlanningClientFactoryAutoRegistrationSuite:
1. "getFactory returns same instance across multiple calls with auto-registration"
- Redundant: "autoRegistrationAttempted flag prevents multiple registration
attempts" is a superset that already verifies reference equality and caching
2. "manual setFactory can override auto-registration"
- Removed: Focus on post-registration replacement rather than pre-registration override
3. "clearFactory allows multiple auto-registration cycles"
- Redundant: "clearFactory resets auto-registration state and allows
re-registration" already tests the clear-and-re-register cycle
Remaining 4 tests provide complete coverage:
- Auto-registration succeeds with correct factory type
- Manual setFactory can replace auto-registered factory
- clearFactory resets state and allows re-registration
- autoRegistrationAttempted flag prevents duplicate work (includes caching test)
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Improvements to ServerSidePlanningClientFactory: 1. Extract magic string to constant: - Add ICEBERG_FACTORY_CLASS_NAME constant for the fully qualified class name - Replace hardcoded string in Class.forName() with constant - Improves maintainability and avoids magic strings 2. Add synchronization to prevent race conditions: - Synchronize setFactory() to prevent concurrent modification during auto-registration - Synchronize clearFactory() to ensure atomic reset of both volatile fields - Prevents race between setFactory()/clearFactory() and tryAutoRegisterFactory() Thread safety analysis: - Before: setFactory/clearFactory were unsynchronized, could race with tryAutoRegisterFactory - After: All factory mutation methods use synchronized block on same monitor - Volatile fields still needed for visibility outside synchronized blocks All 13 tests pass (9 spark + 4 iceberg). Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Replace assertSameInstance helper with inline eq checks for simplicity. Changes: - Remove assertSameInstance(factory1, factory2, context) helper - Replace with inline assert(factory1 eq factory2, message) in tests - Simpler and more direct - no need for wrapper when used in only one test Before: assertSameInstance(factory1, factory2, "second call") assertSameInstance(factory2, factory3, "third call") After: assert(factory1 eq factory2, "Second call should return same instance as first") assert(factory2 eq factory3, "Third call should return same instance as second") All 9 spark tests pass. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Remove overly verbose Scaladoc comments for simple getter methods. Method signatures are self-documenting: - isFactoryRegistered(): Boolean - clearly checks if factory is registered - getFactoryInfo(): Option[String] - clearly returns factory information All 9 spark tests pass. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
…oryName
Simplify test suite and improve API naming:
1. Remove redundant tests from spark module:
- "isFactoryRegistered correctly reports registration state"
(basic behavior covered by other tests)
- "getFactoryInfo returns correct factory class information"
(not a core use case, method tested implicitly elsewhere)
- Reduces from 9 to 7 tests
2. Rename getFactoryInfo() to getRegisteredFactoryName():
- More descriptive name (returns factory class name, not general "info")
- Updated across all files: implementation + spark tests + iceberg tests
3. Remove comment about auto-registration tests location:
- Unnecessary documentation clutter
Final test count: 11 tests (7 spark + 4 iceberg)
All tests pass with Java 17.
Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Streamline test suite by removing tests that cover already-tested behavior: Removed from spark module (7 → 3 tests): - "manual setFactory registers and returns factory" (basic setFactory covered by replacement test) - "clearFactory allows re-registration" (re-registration already tested) - "clearFactory supports multiple registration cycles" (redundant with basic clearFactory test) - "verify test isolation - factory state doesn't leak between tests" (test infrastructure concern, not business logic) Removed from iceberg module (4 → 2 tests): - "manual setFactory can replace auto-registered factory" (manual override not a primary use case) - "clearFactory resets auto-registration state and allows re-registration" (redundant with basic auto-registration test) Remaining tests provide focused, essential coverage: - Spark: manual factory replacement, caching, clearFactory - Iceberg: auto-registration success, caching/flag behavior Final test count: 5 tests (3 spark + 2 iceberg) All tests pass with Java 17. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
tdas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are superfluous tests ... See my comments.
| tryAutoRegisterFactory() | ||
| } | ||
|
|
||
| registeredFactory.getOrElse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the dead access of the registeredFactory be in synchronized as well?
Alternatively registeredFactory is marked as volatile and the only synchronization needed is only in the auto registration so that multiple thread don't register concurrently.
| } | ||
| } | ||
|
|
||
| def isFactoryRegistered(): Boolean = registeredFactory.isDefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these one line functions needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem to be used only for testing. Seems unnecessary additional public methods.
| * correctly discovers and registers the IcebergRESTCatalogPlanningClientFactory | ||
| * when it's on the classpath. | ||
| */ | ||
| class ServerSidePlanningClientFactoryAutoRegistrationSuite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in iceberg module specifically testing iceberg factory is getting auto registered ...shouldnt it be named in the same way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact why do we even need to test this. The only thing that need to be tested outside the other test suite in spark module is that it's iceberg factory by default. That's very tiny test and can't that be added to any existing iceberg module ssp suite??
| } | ||
| } | ||
|
|
||
| test("autoRegistrationAttempted flag prevents multiple registration attempts") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is nothing specific to iceberg client..it's testing clearing which is already covered in the spark module tests.
| @@ -0,0 +1,135 @@ | |||
| /* | |||
| * Copyright (2025) The Delta Lake Project Authors. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2026
Summary
Implement reflection-based auto-registration for
ServerSidePlanningClientFactoryto automatically discover and load theIcebergRESTCatalogPlanningClientFactorywhen the delta-iceberg JAR is on the classpath. This eliminates the need for manual factory registration in production code while maintaining backward compatibility for testing scenarios.What Changed
Core Auto-Registration Logic
tryAutoRegisterFactory()method that usesClass.forName()to loadIcebergRESTCatalogPlanningClientFactoryon first accessgetFactory()is called and no factory has been manually setIllegalStateExceptionwhen delta-iceberg JAR is missing, guiding users to add it to classpathEnhanced Registry API
autoRegistrationAttemptedflag to track initialization stategetFactory()to trigger auto-registration transparentlyclearFactory()to reset both factory and auto-registration state (important for testing)isFactoryRegistered(): Check if factory is availablegetFactoryInfo(): Get factory class name for debugging/loggingBuild & Packaging
org/apache/spark/sql/delta/serverSidePlanning/todeltaIcebergSparkIncludePrefixesso these classes are packaged in delta-iceberg JARCode Quality
serviceLoaderAttempted→autoRegistrationAttemptedfor clarityBenefits
✅ Zero-config production use: No manual
setFactory()calls needed when delta-iceberg JAR is present✅ Backward compatible: Manual registration via
setFactory()still works for testing/mocking✅ Thread-safe: Proper synchronization prevents race conditions
✅ Clear error messages: Guides users when delta-iceberg JAR is missing
✅ Testable:
clearFactory()properly resets state for test isolationTesting
The changes maintain existing test compatibility while enabling automatic registration in production:
setFactory()to inject mock implementationsclearFactory()now properly resets auto-registration state for test isolationHow It Works
flowchart TD Start[getFactory called] --> Check1{Factory already set?} Check1 -->|Yes| Return[Return factory] Check1 -->|No| TryAuto[tryAutoRegisterFactory] TryAuto --> Check2{Already attempted?} Check2 -->|Yes| CheckReg{Factory registered?} Check2 -->|No| Lock[synchronized block] Lock --> DoubleCheck{Still not attempted?} DoubleCheck -->|No| CheckReg DoubleCheck -->|Yes| SetFlag[Set autoRegistrationAttempted = true] SetFlag --> Reflect[Class.forName IcebergRESTCatalogPlanningClientFactory] Reflect --> Success{Loaded?} Success -->|Yes| Register[registeredFactory = Some factory] Success -->|No| Error[Throw IllegalStateException] Register --> Return CheckReg -->|Yes| Return CheckReg -->|No| Error