-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
passthroughFields causes error when writing parquet with new field #250
Comments
Can you use |
Yes that is the solution I am using right now, but I was hoping there would be a nicer way to handle it. This behavior also seems like a gotcha to me, have there been similar interop issues with other formats? For my own education, what was the original use case for |
Perhaps we could add it as a method on each generated struct so that its generic? I'm open to ideas there. As for the motivations — Its very powerful to preserve fields that a service doesn't know about. For example, it allows for schema evolution without having "middle-boxes" have to know about the IDL changes. |
I agree that would be super useful. It would let other libraries like parquet-scrooge work with that method to eliminate this gotcha on a higher level.
As far as I can tell, the passthrough field is basically an |
@alexkuang sorry for the slow response. I think adding a method to each generated struct makes sense. Would you be interested in tackling this? |
Using scrooge-core 4.7.0 and parquet-scrooge 1.8.1, I ran into
org.apache.parquet.io.ParquetEncodingException: field 4 was not found in [...]
when updating a thrift schema with a new optional field. Simplified example case:ParquetEncodingException
.I believe the cause is that the scrooge-generated scala in Logger is calling write on the passthrough fields (https://github.com/twitter/scrooge/blob/develop/scrooge-generator/src/main/resources/scalagen/struct.scala#L597), which delegates to the protocol. parquet-scrooge's protocol populates its schema using the info exposed in the scrooge-generated
fieldInfos
, which does not include the passthrough fields. This in turn causes parquet-scrooge to fail as it tries to find the passthrough field in its schema while servicing thewrite
call.One potential workaround is to call
withoutPassthroughFields
on individual objects before passing them into parquet (or to have parquet-scrooge do it). Is this the recommended solution? It seems like it might be nicer to have a more universal way of excludingpassthroughFields
--I did find a comment in the generated code saying "If the field is unknown and passthrough fields are enabled [...]" but didn't find any obvious references as to how to disable them. Wanted to make sure I'm not missing anything.The text was updated successfully, but these errors were encountered: