Skip to content

Conversation

@cbb330
Copy link

@cbb330 cbb330 commented Nov 26, 2025

Title

Add SPI interfaces for external catalog compatibility testing

Description

This PR introduces Service Provider Interface (SPI) support that allows external catalog implementations (like OpenHouse, Polaris, Unity Catalog, etc.) to plug into Iceberg's test suite without modifying Iceberg source code.

Motivation

External catalog implementations need to validate Iceberg spec compliance by running Iceberg's own tests. Currently, this requires either:

  1. Forking Iceberg and modifying tests (maintenance nightmare)
  2. Copying tests into the external project (code duplication, version drift)
  3. No testing (unacceptable for production catalogs)

This PR enables a fourth option: inject external catalogs via SPI and run Iceberg's tests unmodified.

New SPI Interfaces

Interface Purpose Location
TestTableProvider Creates tables for iceberg-core tests core/src/test/java/
TestCatalogProvider Provides catalog configurations for Spark tests spark/v3.5/spark/src/test/java/
TestSparkSessionProvider Provides custom SparkSession instances spark/v3.5/spark/src/test/java/

Modified Test Infrastructure

  • TableTestBase: Discovers and uses external TestTableProvider via SPI or system property
  • TestBase: Discovers and uses external TestSparkSessionProvider via SPI
  • TestBaseWithCatalog: Discovers and uses external TestCatalogProvider via SPI
  • CatalogTestBase: Respects iceberg.test.catalog.skip.defaults to exclude default catalogs

New Gradle Configuration

  • spark/openhouse.gradle: Shared compatibility test task configuration (reusable template)
  • build.gradle: openhouseCompatibility aggregate task demonstrating usage

Usage Example

External catalogs can implement the SPI interfaces and register them via META-INF/services/:

# Run Iceberg tests against an external catalog
./gradlew :iceberg-spark:iceberg-spark-3.5_2.12:openhouseCompatibilityTest \
    :iceberg-core:openhouseCompatibilityTest \
    -PopenhouseCompatibilityCoordinate=com.example:my-catalog-fixtures:1.0:uber

Key Design Decisions

Decision Rationale
Dual discovery (system property + ServiceLoader) Flexibility for CI vs. local development
Singleton server per JVM Performance (avoid repeated startup)
Exclusion list loaded from uber jar External catalogs control their own exclusions
Zero source changes required for external catalogs Future-proof, no merge conflicts

Test Coverage

This infrastructure has been validated with OpenHouse, running 860+ Iceberg tests including:

  • TestWapWorkflow (WAP commit workflows)
  • TestFastAppend, TestMergeAppend (append operations)
  • TestOverwrite, TestReplacePartitions (overwrite operations)
  • TestTransaction (multi-operation transactions)
  • TestDeleteFrom, TestPartitionedWrites (Spark SQL tests)

Breaking Changes

None. All changes are additive and backward-compatible.

This commit introduces Service Provider Interface (SPI) support that allows
external catalog implementations to plug into Iceberg's test suite without
modifying Iceberg source code.

New SPI Interfaces:
- TestTableProvider: Creates tables for iceberg-core tests
- TestCatalogProvider: Provides catalog configurations for Spark tests
- TestSparkSessionProvider: Provides custom SparkSession instances

Modified Test Infrastructure:
- TableTestBase: Discovers and uses external TestTableProvider via SPI
- TestBase: Discovers and uses external TestSparkSessionProvider via SPI
- TestBaseWithCatalog: Discovers and uses external TestCatalogProvider via SPI
- CatalogTestBase: Respects external catalog configurations
- ParameterizedTestExtension: Prefers most specific @parameters method

New Gradle Configuration:
- spark/openhouse.gradle: Shared compatibility test task configuration
- build.gradle: openhouseCompatibility aggregate task

Usage:
  ./gradlew openhouseCompatibility -PopenhouseCompatibilityCoordinate=<maven-coordinate>

This enables running 860+ Iceberg tests against external catalogs like
OpenHouse without forking or patching Iceberg's test classes.
- Remove legacy openHouseFixturesCoordinate approach with exclusions
- Use openhouseCompatibilityRuntime with transitive=false (cleaner)
- Remove hardcoded SNAPSHOT coordinate from gradle.properties
- Simplify TestBase.loadSparkSessionProvider using ServiceLoader.findFirst()
- Remove redundant system property handling from spark/v3.5/build.gradle
- TestBaseWithCatalog: centralize SPI loading with loadExternalCatalogProviders()
- CatalogTestBase: just override defaultCatalogParameters(), inherit SPI logic
- spark/v3.5/build.gradle: remove duplicate openhouseCompatibilityRuntime config
- spark/openhouse.gradle: single source of truth for OpenHouse test config
- Use consistent single quotes and add descriptive comments
@github-actions github-actions bot removed the HIVE label Nov 26, 2025
- Converted iceberg submodule to real git repo to fix gradle-git-properties
- Simplified TableTestBase to use reflection (no internal SPI dependency)
- Fixed ServiceLoader usage in TestBase for Java 8 compatibility
- Centralized OpenHouse config in spark/openhouse.gradle
- Fixed Hadoop Configuration in OpenHouseTestTableProvider (in tables-test-fixtures) to prevent NPE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant