-
Notifications
You must be signed in to change notification settings - Fork 308
Open
Labels
Beginner FriendlySolving this issue can help people who are starting with the project learnSolving this issue can help people who are starting with the project learnarea/lakehouseData lakehouse integration (Iceberg, Flink)Data lakehouse integration (Iceberg, Flink)priority/normaltype/task
Description
Parent Feature
Part of #7001 - Apache Iceberg Schema Integration
Description
Create a utility for converting between Avro/Protobuf schemas and Apache Iceberg schema format. This enables early experimentation with Iceberg integration.
Scope
Conversion Utility
-
Avro to Iceberg
- Convert Avro schema JSON to Iceberg schema JSON
- Handle basic types (string, int, long, float, double, boolean, bytes)
- Handle logical types (date, timestamp, decimal)
- Handle nested records and arrays
-
Iceberg to Avro
- Convert Iceberg schema to Avro schema
- Preserve field IDs in metadata
- Handle Iceberg-specific types
-
CLI/Utility Class
- Standalone utility for testing
- Input: schema file, Output: converted schema
- Optionally integrate into registry utils module
Out of Scope (for this task)
- Full REST API integration
- Lineage tracking
- UI changes
Type Mapping
| Avro Type | Iceberg Type |
|---|---|
| string | string |
| int | int |
| long | long |
| float | float |
| double | double |
| boolean | boolean |
| bytes | binary |
| date (logical) | date |
| timestamp-millis | timestamp |
| decimal (logical) | decimal(p, s) |
| record | struct |
| array | list |
| map | map |
Acceptance Criteria
- Avro schema converts to valid Iceberg schema
- Iceberg schema converts to valid Avro schema
- Nested structures are handled
- Unit tests cover common cases
Effort Estimate
Medium - 2-3 days
References
Metadata
Metadata
Assignees
Labels
Beginner FriendlySolving this issue can help people who are starting with the project learnSolving this issue can help people who are starting with the project learnarea/lakehouseData lakehouse integration (Iceberg, Flink)Data lakehouse integration (Iceberg, Flink)priority/normaltype/task