Skip to content

Conversation

@ruolin59
Copy link

@ruolin59 ruolin59 commented Nov 10, 2025

What changes are proposed in this pull request, and why are they necessary?

  • Fix Avro namespace collisions when a parent schema contains multiple fields that reference nested records with the same name but different original namespaces, by doing the following
    • Detect collisions per parent record (same record name, different original namespaces).
    • Assign stable numeric suffixes to the reconstructed namespace of colliding records (e.g., “-0”, “-1”, …), keeping non-colliding cases unchanged.

This approach:

  • Prevents fully-qualified name collisions.
  • Avoids leaking source/original namespaces into generated schemas.
  • Minimizes changes: only colliding records get suffixes; everything else remains the same.

Simple example:
Input (two fields with the same record name):

{
  "type": "record",
  "name": "Parent",
  "namespace": "com.app",
  "fields": [
    { "name": "ctxA", "type": ["null", {"type":"record","name":"Ctx","namespace":"com.foo"}] },
    { "name": "ctxB", "type": ["null", {"type":"record","name":"Ctx","namespace":"com.bar"}] }
  ]
}
  • Before (with collision): both nested records reconstructed to the same namespace (e.g., "com.app.Parent"), and has the same name (eg. "Ctx"), which is invalid.
  • After (no collision):
    • ctxA non-null type namespace: "com.app.Parent-0", with name "Ctx"
    • ctxB non-null type namespace: "com.app.Parent-1", with name "Ctx"

How was this patch tested?

  • Added unit tests in SchemaUtilitiesTests:
  • Verifies collision handling for nullable union fields with same record name.
  • Verifies collision handling for direct nested record fields (non-union) with same record name.
  • Asserts distinct namespaces with numeric suffixes (e.g., endsWith("-0") / endsWith("-1")).
  • Ran full module tests; all existing tests pass without updating expected .avsc files.
  • Verified manually in Spark-shell that the namespace collision was fixed for the offending table

@ruolin59 ruolin59 changed the title [Coral-Schema] fix namespace collisions when multiple type names under different original namespaces are under the same schema hierarchy thus ending up with the same new namespace names [Coral-Schema] Disambiguate nested record namespaces with numeric suffixes Nov 10, 2025
@ruolin59 ruolin59 force-pushed the ruolin/fix-namespace-generation branch from c508f72 to 1da3f29 Compare November 15, 2025 01:35
@aastha25
Copy link
Contributor

Thanks for the PR, purely based on the PR description, should the namespace unique-fication be at the conflicting layer?

"com.app.Parent.Ctx0"
"com.app.Parent.Ctx1"

Also, some of the testing done section has company specific internal details, that can be skipped in this public forum.

@ruolin59
Copy link
Author

@aastha25

Thanks for the PR, purely based on the PR description, should the namespace unique-fication be at the conflicting layer?

"com.app.Parent.Ctx0"
"com.app.Parent.Ctx1"

This was a mistake in the description. Ctx is not part of the namespace, as it is the actual name where the collision occurs. I've updated the description to fix

@ruolin59 ruolin59 force-pushed the ruolin/fix-namespace-generation branch from 1da3f29 to b5cb803 Compare November 21, 2025 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants