Skip to content

Conversation

@janvanbesien
Copy link

Rather than managing the avro reader schema configuration in the input
format getSplits method, it needs to be managed when creating the format
bundle. Otherwise a crunch pipeline that has multiple inputs (kite views)
with different schemas will not see the correct reader schemas.

Note that the test only demonstrates the problem when also upgrading to
crunch 0.13.0 (which is not part of this commit). This is due to
CRUNCH-551 which is a fix for a problem in crunch that hides the current
issue (at least in the scenario of the test) in versions before crunch-0.13.0.

A test was also added to verify the behaviour with plain map/reduce to
ensure that this continues to work as expected.

Rather than managing the avro reader schema configuration in the input
format getSplits method, it needs to be managed when creating the format
bundle. Otherwise a crunch pipeline that has multiple inputs (kite views)
with different schemas will not see the correct reader schemas.

Note that the test only demonstrates the problem when also upgrading to
crunch 0.13.0 (which is not part of this commit). This is due to
CRUNCH-551 which is a fix for a problem in crunch that hides the current
issue (at least in the scenario of the test) in versions before crunch-0.13.0.

A test was also added to verify the behaviour with plain map/reduce to
ensure that this continues to work as expected.
@janvanbesien janvanbesien force-pushed the avro-schema-format-bundle branch from b4055a2 to 46068dc Compare July 26, 2018 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant