-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Open
Labels
Description
What happened?
-
write int to target schema int64, succeed
-
write int to target schema nullable(int64), using storage_write_api, succeed
-
write int to target schema nullable(int64), using file_load avro, failing with
Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null","long"]: 123 (field=nullableLong)
A simple reproduce (not using Beam):
public class AvroTest {
private static final String SCHEMA_JSON = "{\n" +
" \"type\": \"record\",\n" +
" \"name\": \"UserEvent\",\n" +
" \"namespace\": \"com.example.avro\",\n" +
" \"fields\": [\n" +
" {\"name\": \"userId\", \"type\": \"string\"},\n" +
" {\"name\": \"nonNullLong\", \"type\": \"long\"},\n" +
" {\"name\": \"nullableLong\", \"type\": [\"null\", \"long\"], \"default\": null}\n" +
" ]\n" +
"}";
public static void main(String[] argv) throws AvroRuntimeException, IOException {
Schema schema = new Schema.Parser().parse(SCHEMA_JSON);
GenericRecord eventWithTimestamp = new GenericData.Record(schema);
eventWithTimestamp.put("userId", "user-123");
eventWithTimestamp.put("nonNullLong", 123);
eventWithTimestamp.put("nullableLong", 123); // fail
File avroOutputFile = new File("user-events.avro");
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schema);
try (DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter)) {
dataFileWriter.create(schema, avroOutputFile);
dataFileWriter.append(eventWithTimestamp);
}
}
}
this is a known avro issue: https://stackoverflow.com/questions/35963285/org-apache-avro-unresolvedunionexception-not-in-union-long-null
However this led a breaking change for Beam Yaml 2.69.0 where it switched the batch BigQueryIO write to storage_write_api to Managed IO (backed by file_load).
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner