[Server-Side Planning] Column names containing period #5911

murali-db · 2026-01-22T12:20:53Z

Summary

Implement proper handling of column names containing dots (literal column names vs. nested fields) for Iceberg server-side planning. This PR adds backtick escaping for literal dotted columns in projections while correctly using raw field names for filters (where Iceberg's binding logic handles disambiguation automatically).

Background

Column names can contain dots in various scenarios:

Unity Catalog tables with dots in column names
Flattened nested schemas
Data from systems that allow dots in column names (e.g., address.city as a single column name)

It's important to distinguish between:

Literal dotted column names: address.city is a single top-level column
Nested field references: address.intCol refers to field intCol within struct address

Key Insights

After thorough investigation of both Spark and Iceberg behavior:

Projections: Need backtick escaping to distinguish in JSON
- Example: ["address.city", "address.intCol"]
- Escaped literal columns vs. unescaped nested field paths
Filters: Don't need escaping - Iceberg's schema-aware binding handles it
- Iceberg Expression API uses raw field names: Expressions.equal("address.city", value)
- Binder.bind(schema, expression) resolves ambiguity using schema structure
- Binding checks for literal field match FIRST, then tries nested parsing
- JSON sent over HTTP also uses raw field names (no escaping)

Implementation

Projection Column Escaping

File: ServerSidePlannedTable.scala (only file changed)

Added logic to escape literal dotted columns in projections:

escapeProjectedColumns(): Processes required schema fields
escapeColumnNameIfNeeded(): Recursively handles nested structures
Logic: If a column name contains dots and exists as a top-level field in schema, wrap it in backticks
Nested field access (e.g., parent.child) is handled recursively without escaping the parent

Filter Handling

File: SparkToIcebergExpressionConverter.scala (no changes needed)

Filters correctly use raw field names (no escaping):

All filter operators use plain column names as-is
Iceberg's Binder.bind() resolves literal vs. nested using schema
Server-side binding prioritizes literal field names over nested parsing
No client-side escaping needed

Test Coverage

Comprehensive Test Suite ✅

TestSchemas.scala: Defines schema with BOTH literal dotted columns AND nested structs
- Literal: address.city, a.b.c, location.state, etc.
- Nested: address struct with intCol, metadata struct with stringCol
SparkToIcebergExpressionConverterSuite.scala:
- 17 test cases covering all operators with dotted columns
- Verifies raw field names are used (no backticks in Expression objects)
IcebergRESTCatalogPlanningClientSuite.scala:
- Tests filter/projection sent to REST server
- Verifies Binder.bind() correctly resolves dotted names
- Tests both literal and nested dotted columns
ServerSidePlannedTableSuite.scala:
- End-to-end tests with filter and projection pushdown
- 12 tests covering full query execution

Testing

✅ All unit tests pass:

22 iceberg tests (SparkToIcebergExpressionConverterSuite, IcebergRESTCatalogPlanningClientSuite)
12 spark tests (ServerSidePlannedTableSuite)
Total: 34 tests passing
Scalastyle checks: 0 errors, 0 warnings
Tested with Java 17

Files Changed

Single file modified:

ServerSidePlannedTable.scala: Added projection escaping logic (+51 lines)

Total: 1 file changed, 51 insertions(+), 1 deletion(-)

…n Iceberg filter conversion Add comprehensive test coverage for Iceberg filter conversion when column names contain dots. Validates that Iceberg correctly handles both nested field access and literal column names containing dots. Test Coverage: - Literal column names with dots (e.g., "address.city" as single column name) - All filter operators: equality, comparison, null checks, IN, string operations - Logical operators (AND, OR) with mixed column names - Distinction between nested field access vs literal column names with dots Key Finding: Iceberg's expression API already correctly handles literal column names containing dots without requiring special escaping. The schema can contain both nested structs (address.intCol) and literal dotted names (address.city), and Iceberg distinguishes them correctly. Changes: - Add test schema with literal column names containing dots - Add 17 new test cases covering all filter operators with dotted names - Update test documentation to clarify nested vs literal column names

…umn names Add test coverage for column names containing dots (literal column names, not nested fields). Tests verify that Iceberg correctly handles both literal dotted column names (e.g., address.city) and nested field references (e.g., address.intCol) without requiring backtick escaping. Key insight: Iceberg's internal schema and REST API handle dotted column names natively. Backticks are only needed in SQL/parser contexts, not in Iceberg expressions or REST protocol. Changes: - Add literal dotted column names to TestSchemas (address.city, a.b.c, location.state, etc.) - Add 17 test cases in SparkToIcebergExpressionConverterSuite covering all operators - Update IcebergRESTCatalogPlanningClientSuite.populateTestData to include all 21 fields - Add test case for literal dotted column name in filter+projection All tests pass with Java 17.

…n names Add backtick escaping for column names containing dots when sending projections to Iceberg REST API. This distinguishes between: - Literal dotted columns: "address.city" as a single field -> "`address.city`" - Nested field access: address.intCol (parent.child) -> "address.intCol" Implementation: - Added escapeProjectedColumns() to process required schema fields - Added escapeColumnNameIfNeeded() for recursive nested field handling - Escaping happens in ServerSidePlannedTable before calling planScan() - No changes to filter conversion (Iceberg's Binder handles disambiguation) All tests passing (34 total: 22 iceberg + 12 spark) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Use a.b.c (which is unambiguous - no struct named 'a') instead of address.city for the literal dotted column test case to make the test clearer. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

…Type directly Instead of using fieldNames (which only gives top-level names), traverse the StructType directly to build flattened dot-notation paths. This correctly handles nested structs by recursively flattening them. Example: If requiredSchema has struct 'address' with field 'intCol', we now correctly generate "address.intCol" instead of just "address". Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

…s with dots CRITICAL FIX: The previous implementation only escaped dotted field names at the top level. This fails for nested structs that have fields with dots in their names. Example: parent (struct) { "child.name" (string) } Previous: would generate "parent.child.name" (ambiguous!) Fixed: now generates "parent.`child.name`" (correctly escaped) Changes: 1. Simplified flattenSchema() logic: ANY field with dots gets escaped, regardless of nesting level 2. Simplified test schema: removed redundant dotted columns, added critical test case for nested field with dots (parent."child.name") 3. Updated all test cases to reference fields that exist in simplified schema All tests passing (34 total: 22 iceberg + 12 spark) Scalastyle: 0 errors, 0 warnings Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

…ility - Move escapeProjectedColumns and flattenSchema methods to companion object - Make methods package-private for direct unit testing - Add comprehensive unit test covering essential dotted field patterns: * Top-level with dots (e.g., `a.b.c`) * Normal nested (e.g., parent.child) * Multi-level nested (e.g., level1.level2.level3) * Nested with dotted leaf (e.g., data.`field.name`) * Struct with dots (e.g., `root.struct`.value) - Add test to verify Spark's behavior with struct columns - Restore metadata.stringCol test case alongside parent.child.name - All tests passing (14 spark + 22 iceberg = 36 total) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

murali-db force-pushed the escape-dots-in-column-names branch from 79b7a6e to 3c20876 Compare January 22, 2026 12:50

murali-db changed the title ~~[Server-Side Planning] Escape dots in column names for Iceberg filter pushdown~~ [Server-Side Planning] Add test coverage for column names with dots in Iceberg filter conversion Jan 22, 2026

murali-db changed the title ~~[Server-Side Planning] Add test coverage for column names with dots in Iceberg filter conversion~~ [Server-Side Planning] Escape dots in column names when sending to Iceberg REST API Jan 22, 2026

murali-db force-pushed the escape-dots-in-column-names branch from 48aa85e to fb9ad0e Compare January 22, 2026 14:04

murali-db changed the title ~~[Server-Side Planning] Escape dots in column names when sending to Iceberg REST API~~ [Server-Side Planning] Add comprehensive test coverage for dotted column names Jan 22, 2026

murali-db changed the title ~~[Server-Side Planning] Add comprehensive test coverage for dotted column names~~ [Server-Side Planning] Column names containing period Jan 22, 2026

murali-db force-pushed the escape-dots-in-column-names branch from a0cba46 to 06c4eaf Compare January 23, 2026 12:23

murali-db force-pushed the escape-dots-in-column-names branch from 06c4eaf to 8dee476 Compare January 23, 2026 12:26

murali-db and others added 4 commits January 23, 2026 12:34

[Server-Side Planning] Use a.b.c for literal dotted column test case

aa36f08

Use a.b.c (which is unambiguous - no struct named 'a') instead of address.city for the literal dotted column test case to make the test clearer. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Server-Side Planning] Column names containing period #5911

[Server-Side Planning] Column names containing period #5911

Uh oh!

murali-db commented Jan 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Server-Side Planning] Column names containing period #5911

Are you sure you want to change the base?

[Server-Side Planning] Column names containing period #5911

Uh oh!

Conversation

murali-db commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Key Insights

Implementation

Projection Column Escaping

Filter Handling

Test Coverage

Comprehensive Test Suite ✅

Testing

Files Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

murali-db commented Jan 22, 2026 •

edited

Loading