Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backward compatibility support for Avro schema when deserialize data #275

Open
chioai1309 opened this issue Apr 8, 2021 · 1 comment
Open
Labels

Comments

@chioai1309
Copy link

My project currently using jackson-dataformat-avro (version 2.12.2) to convert the Java POJO and store it. Just facing problem is that when the schema is evolve then the old data stored cannot be deserialize back with the following exception:

com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input in FIELD_NAME
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:659)
	at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:636)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl._nextByteGuaranteed2(JacksonAvroParserImpl.java:1038)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl._nextByteGuaranteed(JacksonAvroParserImpl.java:1033)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl._decodeIntSlow(JacksonAvroParserImpl.java:265)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl.decodeInt(JacksonAvroParserImpl.java:234)
	at com.fasterxml.jackson.dataformat.avro.deser.JacksonAvroParserImpl.decodeIndex(JacksonAvroParserImpl.java:988)
	at com.fasterxml.jackson.dataformat.avro.deser.ScalarDecoder$ScalarUnionDecoder$FR.readValue(ScalarDecoder.java:412)
	at com.fasterxml.jackson.dataformat.avro.deser.RecordReader$Std.nextToken(RecordReader.java:142)
	at com.fasterxml.jackson.dataformat.avro.deser.AvroParserImpl.nextToken(AvroParserImpl.java:97)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:288)
	at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:156)
	at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:2079)
	at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1453)

Reason is that RecordReader$Std try to resolve the token for the new added field while the data reader reach till the end of the stored message.

Given at the first version I have this POJO and corresponding Avro schema generated for it:

@Getter
@Setter
@Document(StoreEntity.COLLECTION_NAME)
public class StoreEntity extends AuditableEntity {

  public static final String COLLECTION_NAME = "StoreEntity";

  @Id
  private String id;
  @Field
  @Length(max = 150)
  @NotBlank
  private String name;
  @Field
  @Indexed(unique = true)
  @Length(max = 50)
  @NotBlank
  private String code;
  @Field
  @CountryCode
  private String countryCode;
}
===================================
{
   "type":"record",
   "name":"StoredEntity",
   "namespace":"com.mydomain.entity",
   "fields":[
      { "name":"code", "type":["null","string"] },
      { "name":"countryCode", "type":["null","string"] },
      { "name":"createdBy", "type":["null","string"] },
      { "name":"createdDate", "type":["null","string"] },
      { "name":"id", "type":["null","string"] },
      { "name":"lastModifiedBy", "type":["null","string"] },
      { "name":"lastModifiedDate", "type":["null","string"] },
      { "name":"name", "type":["null","string"] }
   ]
}

Later on the schema is evolved with the new field append to the end of the schema

@Getter
@Setter
@Document(StoreEntity.COLLECTION_NAME)
public class StoreEntity extends AuditableEntity {

  public static final String COLLECTION_NAME = "StoreEntity";

  @Id
  private String id;
  @Field
  @Length(max = 150)
  @NotBlank
  private String name;
  @Field
  @Indexed(unique = true)
  @Length(max = 50)
  @NotBlank
  private String code;
  @Field
  @CountryCode
  private String countryCode;
  @Field
  @JsonProperty(defaultValue = "null")
  private String phone;
}
====================================================
{
   "type":"record",
   "name":"StoredEntity",
   "namespace":"com.mydomain.entity",
   "fields":[
      { "name":"code", "type":["null","string"] },
      { "name":"countryCode", "type":["null","string"] },
      { "name":"createdBy", "type":["null","string"] },
      { "name":"createdDate", "type":["null","string"] },
      { "name":"id", "type":["null","string"] },
      { "name":"lastModifiedBy", "type":["null","string"] },
      { "name":"lastModifiedDate", "type":["null","string"] },
      { "name":"name", "type":["null","string"] }
      { "name":"phone", "type":["null","string"], "default":null }
   ]
}

By following some convention of Avro schema Resolution mentioned here http://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution

if the reader's record schema has a field that contains a default value, and writer's schema does not have a field with the same name, then the reader should use the default value from its field.

Also another source of suggestion here https://docs.confluent.io/2.0.0/avro.html#backward-compatibility

But seem this is not the case with the library.

@cowtowncoder
Copy link
Member

Jackson's handling of default values may be incomplete wrt Avro definitions, but the exception you get seems bit different and it would be good to reproduce that from minimal reproduction.
This needs to include specific code used to read and write content, not just java type / avro schema definitions. Part of this is to ensure that use of reader/writer schemas (wrt schema evolution) is correct.

So, if it was possible to reduce it, including elimination of use of frameworks like Lombok (just for testing as our tests cannot add it as a dependency -- use with Jackson is fine in itself), it'd be possible to have a look at what is causing the problem.

Jackson does support schema evolution in and of itself, but as you probably know it is necessary to separate specify reader and writer schemas: "writer schema" being the schema that was used for writing, and "reader schema" the new one application wants to use. It is never possible to just use a new schema in isolation since Avro does not include enough metadata for decoder to handle changes, even compatible ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants