-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Hello,
I tested read.avro using a moderately complicated schema. Some fields contain sub-records, other fields contain arrays of records. One of the sub-records (named moments and containing mean, variance, skewness, and kurtosis fields) is defined the first time and referenced as a type subsequently. This does not cause avro any problems, but read.avro throws the following error:
> dat <- read.avro(file="/path/to/file/part-r-00000.avro")
Error in (function (x, schema, flatten = T, simplify = F, encoded_unions = T, :
Unsupported Avro type: moments
The schema being read here reads (in part):
...
"fields" : [ {
"name" : "routename",
"type" : "string",
"doc" : "path identifier indicates unique fix sequence"
}, {
"name" : "aircrafttype",
"type" : "string"
}, {
"name" : "lowaltitudebin",
"type" : "double",
"doc" : "altitude [feet] at low end of route (rounded to nearest 1000ft)"
}, {
"name" : "highaltitudebin",
"type" : "double",
"doc" : "altitude [feet] at high end of route (rounded to nearest 1000ft)"
}, {
"name" : "route",
"type" : [ "null", {
"type" : "record",
"name" : "routemetrics",
"fields" : [ {
"name" : "route",
"type" : [ "string", "null" ]
}, {
"name" : "initialalttude",
"type" : [ "null", {
"type" : "record",
"name" : "moments",
"fields" : [ {
"name" : "mean",
"type" : "double"
}, {
"name" : "variance",
"type" : "double"
}, {
"name" : "skewness",
"type" : "double"
}, {
"name" : "kurtosis",
"type" : "double"
}, {
"name" : "samplesize",
"type" : "long"
} ]
} ],
"doc" : "moments [feet] characterizing distribution of atltitudes at the beginning of the route (within given binning constraint)"
}, {
"name" : "terminalaltitude",
"type" : [ "null", "moments" ],
"doc" : "moments [feet] characterizing distribution of atltitudes at the end of the route (within given binning constraint)"
}, {...
Note that moments is defined as a type (as part of a union) for the first time in the initialalttude field, which is a field of the routemetrics record nested inside of the top-level route field. After that, moments is referenced by name in the subsequent terminalaltitude field.
Are there any plans to deal well with schemas like the one above?