Skip to content

[Feature] Apache Iceberg Schema Integration #7001

@carlesarnal

Description

@carlesarnal

Parent Epic

Part of #6993 - Data Platform Integration

Description

Implement integration between Apicurio Registry and Apache Iceberg, enabling schema bridging between streaming formats (Avro, Protobuf) and Iceberg table schemas.

Background

Apache Iceberg is becoming the standard table format for data lakehouses. Organizations running Kafka-to-Iceberg pipelines need:

  • Schema consistency between streaming and lakehouse layers
  • Compatibility validation before pipeline deployment
  • Lineage tracking from source topics to target tables

Requirements

Schema Import/Export

  1. Iceberg Schema Import

    • Parse Iceberg table metadata JSON
    • Convert Iceberg schema to Apicurio artifact
    • Preserve field IDs and metadata
  2. Iceberg Schema Export

    • Convert Avro/Protobuf schemas to Iceberg format
    • Generate valid Iceberg schema JSON
    • Handle type mappings (logical types, nested structures)

Compatibility Checking

  1. Cross-Format Compatibility

    • Validate Avro schema against Iceberg schema
    • Validate Protobuf schema against Iceberg schema
    • Report incompatible type mappings
  2. Evolution Rules

    • Apply Iceberg evolution rules (add/drop columns, type widening)
    • Detect breaking changes for Iceberg tables
    • Handle column ID tracking

Lineage Tracking

  1. Source-to-Target Mapping
    • Link Kafka topic schemas to Iceberg table schemas
    • Track transformations in the pipeline
    • Visualize lineage in UI

REST Catalog Integration

  • Act as schema source for Iceberg REST Catalog
  • Provide schema resolution for catalog operations

Acceptance Criteria

  • Iceberg schemas can be imported as artifacts
  • Avro/Protobuf can be exported as Iceberg schemas
  • Compatibility between formats is validated
  • Lineage between streaming and Iceberg is tracked
  • Documentation covers common pipeline patterns

Technical Considerations

  • Use Apache Iceberg Java libraries for schema handling
  • Handle Iceberg-specific types (UUID, fixed, etc.)
  • Consider Iceberg Spec v2 and v3 differences

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions