Skip to content

Conversation

@ruolin59
Copy link
Contributor

@ruolin59 ruolin59 commented Nov 6, 2025

What changes are proposed in this pull request, and why are they necessary?

This PR fixes a bug in ToLowercaseSchemaVisitor where field names within complex default values were not being lowercased, causing schema validation errors when creating fields with lowercased schemas.

Given an input schema with a record field that has a default value:

{
  "name": "Struct_Field",
  "type": {
    "type": "record",
    "name": "NestedRecord",
    "fields": [
      {"name": "firstName", "type": "string"},
      {"name": "Age", "type": "int"}
    ]
  },
  "default": {
    "firstName": "John",
    "Age": 30
  }
}

Before the fix:

{
  "name": "struct_field",
  "type": {
    "type": "record",
    "name": "nestedrecord",
    "fields": [
      {"name": "firstname", "type": "string"},
      {"name": "age", "type": "int"}
    ]
  },
  "default": {
    "firstName": "John",  // Mismatched casing
    "Age": 30             // Mismatched casing
  }
}

Result: AvroTypeException: Invalid default for field struct_field

After the fix:

{
  "name": "struct_field",
  "type": {
    "type": "record",
    "name": "nestedrecord",
    "fields": [
      {"name": "firstname", "type": "string"},
      {"name": "age", "type": "int"}
    ]
  },
  "default": {
    "firstname": "John",  // Correctly lowercased
    "age": 30             // Correctly lowercased
  }
}

The Solution:
Added lowercaseDefaultValue() method that recursively transforms default values based on the schema type:

  • RECORD types: Lowercases field names in record default values
  • MAP types: Lowercases all keys in map default values
  • ARRAY types: Recursively processes each array element
  • Primitives: Returns unchanged

The implementation handles both GenericData.Record and Map-based default value representations

How was this patch tested?

Added a new test testLowercaseSchemaWithComplexDefaultValues() in SchemaUtilitiesTests that verifies the fix handles:

  1. Simple primitive default values (no change needed)
  2. Nested record default values with mixed-case field names
  3. Map default values with mixed-case keys
  4. Array default values containing records with mixed-case field names

The test uses input/expected schema files:

  • testLowercaseSchemaWithDefaultValues-input.avsc - Contains fields with various default values using mixed case
  • testLowercaseSchemaWithDefaultValues-expected.avsc - Contains the fully lowercased expected output

All existing tests continue to pass, confirming backward compatibility.

@ruolin59 ruolin59 changed the title Ruolin/fix default value lowercasing [Coral-Schema] default value lowercasing Nov 7, 2025
Copy link
Contributor

@aastha25 aastha25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for picking this up! seems like an interesting bug!

* @return The default value with all field names lowercased
*/
@SuppressWarnings("unchecked")
private Object lowercaseDefaultValue(Object defaultValue, Schema schema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename the arguments to (fieldDefaultValue, fieldSchema) to capture that this is a per field level method.

// Handle union types - get the actual schema based on the default value type
if (schema.getType() == Schema.Type.UNION) {
// For unions, the default value corresponds to the first type in the union
actualSchema = schema.getTypes().get(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumption is not true. we 'expect' it to be the first type, but it can either.
There's actually some utility method in coral-schema which is specifically designed to fetch the 'underlying schema' from the union type - that can be used here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually according to Avro specifications: https://avro.apache.org/docs/1.11.2/specification/

under "Unions":

(Note that when a default value is specified for a record field whose type is a union, the type of the default value must match the first element of the union. Thus, for unions containing “null”, the “null” is usually listed first, since the default value of such unions is typically null.)

I'll update it to at least handle nullable defaults though


case MAP:
// For maps, lowercase the keys and recursively process values
if (defaultValue instanceof Map) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this if condition if its already inside the CASE? can this be simplified here and other places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the case statement is checking the case according to the schema, but this is just to make sure that the defaultValue is actually in accordance to the schema

Object lowercasedValue = lowercaseDefaultValue(entry.getValue(), valueSchema);
lowercasedMap.put(lowercasedKey, lowercasedValue);
}
return lowercasedMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the default value was [null, map], are we returning map by any chance? where's the logic to construct the union schema back? let's check this for other types too and ensure the spec of this method is returning the same schema as the input, just lower-cased.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the default value was [null, map]
do you mean "if the field type was [null, map]"?

According to avro specs, in this case the default value SHOULD be null, but I'm going to add something under case UNION so that it can still parse properly if the default value is the non-null type. However, I'm not going to implement the complex union version of it

Not sure what you mean by "where's the logic to construct the union schema back", we aren't constructing any schemas, this is just using the schema to figure out how to parse the default values properly so the relevant parts of the default values can be lowercased

}

switch (actualSchema.getType()) {
case RECORD:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the handling for complex union also? I understand we're stripping away the nulls for the union, but we could still have the a nullable- complex union?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for complex unions, we rely on the fact that the schema supplied is valid according to specs, see #560 (comment)

@ruolin59 ruolin59 force-pushed the ruolin/fix-default-value-lowercasing branch from b2fe0f7 to db4a5ee Compare January 9, 2026 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants