Skip to content

Conversation

@kumarpritam863
Copy link
Contributor

Summary

This PR adds support for default values in Iceberg Kafka Connect, enabling automatic extraction and application of default values from Kafka Connect schemas during both auto-table creation and
schema evolution
. Default values are only applied when the target Iceberg table uses format version 3 or higher, which introduced native support for column defaults.

Background

Iceberg format v3 introduced support for initial and write default values on columns. When a new column with a default value is added to a table, the default value is used for:

  • Initial default: Values to read for existing rows that don't have the column
  • Write default: Values to write when no value is explicitly provided

Kafka Connect schemas also support default values through the defaultValue() method on field schemas. This PR bridges these two systems, automatically transferring default values from Kafka Connect
to Iceberg tables when schema evolution occurs.

Behavior

Auto-Table Creation

When Kafka Connect auto-creates a new Iceberg table:

  1. If creating a format v3+ table: Default values from Kafka Connect schemas are extracted and applied
  2. If creating a format v2 or v1 table: Default values are ignored (not supported)

Schema Evolution on Existing Tables

When adding new columns to an existing table:

  1. If the table is format v3+: Default values are extracted and applied to new columns
  2. If the table is format v2 or v1: Default values are ignored and logged

Example

Given a Kafka Connect schema:
Schema schema = SchemaBuilder.struct()
.field("id", Schema.INT32_SCHEMA)
.field("name", SchemaBuilder.string().defaultValue("unknown").build())
.field("age", SchemaBuilder.int32().defaultValue(0).build())
.field("active", SchemaBuilder.bool().defaultValue(true).build())
.build();

For a format v3 table:

  • New columns name, age, active will have their default values set
  • Existing rows will see "unknown", 0, true for these columns
  • New writes without these fields will also use the defaults

For a format v2 table:

  • New columns are added without defaults (null/absent for existing rows)

Compatibility

✅ Backward Compatible:

  • No breaking API changes
  • Format v1/v2 tables continue to work as before (no defaults)
  • Only format v3+ tables gain default value support

✅ Forward Compatible:

  • Design accommodates future format versions (v4+) automatically
  • Check is >= rather than == for format version

✅ Safe Fallback:

  • If default value conversion fails, logs a warning and continues without the default
  • Prevents schema evolution failures due to unsupported default value types

@kumarpritam863
Copy link
Contributor Author

@bryanck please review.

@kumarpritam863 kumarpritam863 changed the title Add Default Value Support for Kafka Connect Schema Evolution Add Default Value Support for Kafka Connect Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant