-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Description
DataFusion encodes Arrow-specific types (like unsigned integers) by misusing type_variation_reference. This violates Substrait's technology principle to avoid specialization for a single producer.
Per the spec, type_variation_reference is for physical variations of the same type where "all variations are expected to have the same semantics." Signed and unsigned integers have different semantics.
Types affected:
- UInt8/16/32/64
- LargeUtf8/LargeBinary/LargeList
- Decimal256
- Duration
- Date64
- Time32
- Time64
Solution
Use Arrow's official extension_types.yaml which already defines these types (u8, u16, large_string, decimal256, etc.).
Before:
Type::I8 { type_variation_reference: 1 } // means UInt8
After:
extension_uris: [{ uri: ".../extension_types.yaml" }]
Type::UserDefined { name: "u8" }
The consumer already handles extension types, so backwards compatibility can be maintained.
asolimando
Metadata
Metadata
Assignees
Labels
No labels