-
Notifications
You must be signed in to change notification settings - Fork 308
Open
Labels
area/lakehouseData lakehouse integration (Iceberg, Flink)Data lakehouse integration (Iceberg, Flink)priority/normaltype/enhancementNew feature or requestNew feature or request
Description
Parent Epic
Part of #6993 - Data Platform Integration
Description
Implement integration between Apicurio Registry and Apache Iceberg, enabling schema bridging between streaming formats (Avro, Protobuf) and Iceberg table schemas.
Background
Apache Iceberg is becoming the standard table format for data lakehouses. Organizations running Kafka-to-Iceberg pipelines need:
- Schema consistency between streaming and lakehouse layers
- Compatibility validation before pipeline deployment
- Lineage tracking from source topics to target tables
Requirements
Schema Import/Export
-
Iceberg Schema Import
- Parse Iceberg table metadata JSON
- Convert Iceberg schema to Apicurio artifact
- Preserve field IDs and metadata
-
Iceberg Schema Export
- Convert Avro/Protobuf schemas to Iceberg format
- Generate valid Iceberg schema JSON
- Handle type mappings (logical types, nested structures)
Compatibility Checking
-
Cross-Format Compatibility
- Validate Avro schema against Iceberg schema
- Validate Protobuf schema against Iceberg schema
- Report incompatible type mappings
-
Evolution Rules
- Apply Iceberg evolution rules (add/drop columns, type widening)
- Detect breaking changes for Iceberg tables
- Handle column ID tracking
Lineage Tracking
- Source-to-Target Mapping
- Link Kafka topic schemas to Iceberg table schemas
- Track transformations in the pipeline
- Visualize lineage in UI
REST Catalog Integration
- Act as schema source for Iceberg REST Catalog
- Provide schema resolution for catalog operations
Acceptance Criteria
- Iceberg schemas can be imported as artifacts
- Avro/Protobuf can be exported as Iceberg schemas
- Compatibility between formats is validated
- Lineage between streaming and Iceberg is tracked
- Documentation covers common pipeline patterns
Technical Considerations
- Use Apache Iceberg Java libraries for schema handling
- Handle Iceberg-specific types (UUID, fixed, etc.)
- Consider Iceberg Spec v2 and v3 differences
References
Metadata
Metadata
Assignees
Labels
area/lakehouseData lakehouse integration (Iceberg, Flink)Data lakehouse integration (Iceberg, Flink)priority/normaltype/enhancementNew feature or requestNew feature or request