Skip to content

failing on complicated schemas #2

@mattpollock

Description

@mattpollock

Hello,

I tested read.avro using a moderately complicated schema. Some fields contain sub-records, other fields contain arrays of records. One of the sub-records (named moments and containing mean, variance, skewness, and kurtosis fields) is defined the first time and referenced as a type subsequently. This does not cause avro any problems, but read.avro throws the following error:

> dat <- read.avro(file="/path/to/file/part-r-00000.avro")
Error in (function (x, schema, flatten = T, simplify = F, encoded_unions = T,  : 
  Unsupported Avro type: moments

The schema being read here reads (in part):

...
"fields" : [ {
    "name" : "routename",
    "type" : "string",
    "doc" : "path identifier indicates unique fix sequence"
  }, {
    "name" : "aircrafttype",
    "type" : "string"
  }, {
    "name" : "lowaltitudebin",
    "type" : "double",
    "doc" : "altitude [feet] at low end of route (rounded to nearest 1000ft)"
  }, {
    "name" : "highaltitudebin",
    "type" : "double",
    "doc" : "altitude [feet] at high end of route (rounded to nearest 1000ft)"
  }, {
    "name" : "route",
    "type" : [ "null", {
      "type" : "record",
      "name" : "routemetrics",
      "fields" : [ {
        "name" : "route",
        "type" : [ "string", "null" ]
      }, {
        "name" : "initialalttude",
        "type" : [ "null", {
          "type" : "record",
          "name" : "moments",
          "fields" : [ {
            "name" : "mean",
            "type" : "double"
          }, {
            "name" : "variance",
            "type" : "double"
          }, {
            "name" : "skewness",
            "type" : "double"
          }, {
            "name" : "kurtosis",
            "type" : "double"
          }, {
            "name" : "samplesize",
            "type" : "long"
          } ]
        } ],
        "doc" : "moments [feet] characterizing distribution of atltitudes at the beginning of the route (within given binning constraint)"
      }, {
        "name" : "terminalaltitude",
        "type" : [ "null", "moments" ],
        "doc" : "moments [feet] characterizing distribution of atltitudes at the end of the route (within given binning constraint)"
      }, {...

Note that moments is defined as a type (as part of a union) for the first time in the initialalttude field, which is a field of the routemetrics record nested inside of the top-level route field. After that, moments is referenced by name in the subsequent terminalaltitude field.

Are there any plans to deal well with schemas like the one above?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions