-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Change default column mapping mode to name in Delta Lake #27790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
8cfc2d4 to
55cbe02
Compare
864c934 to
0baa136
Compare
0baa136 to
15d52f5
Compare
What is the default column mapping mode used by databricks and latest delta oss? |
| - `7 DAYS` | ||
| * - `delta.column-mapping-mode` | ||
| - Column mapping mode. Possible values are: `ID`, `NAME`, and `NONE`. | ||
| - `NAME` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why NAME to be the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. With Iceberg and Delta working out some compatibility frictions, i wonder what the default mode should be (or whether the ideal exists yet).
Both use none as the default column mapping mode. |
findepi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am OK with the change directionally and i believe it's an improvement.
I haven't review the code carefully, so you may want second pair of eyes.
I am concerned about
- this may be not the last time we change default mapping mode. maybe the ideal iceberg-compact mode doesn't exist yet?
- DBX still using 'none' as the default. They also want the best for their uses (drop/rename columns working OOB), so maybe we're overlooking something when doing this switch?
|
|
||
| @Test | ||
| @Override | ||
| public void testDropColumn() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
| return columnMappingMode; | ||
| } | ||
|
|
||
| @Config("delta.column-mapping-mode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delta.default-column-mapping-mode to indicate the value selected here is still overridable during CT
| * `NONE` | ||
|
|
||
| Defaults to `NONE`. | ||
| Defaults to `NAME`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any other documentation pieces to update? Like do we eg say drop/rename column don't work out of the box and now they do?
| the [VACUUM](delta-lake-vacuum) procedure. The equivalent catalog session | ||
| property is `vacuum_min_retention`. | ||
| - `7 DAYS` | ||
| * - `delta.column-mapping-mode` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update name here too
Description
Change the default column mapping mode in Delta Lake to name so the connector can allow more column operations by default. As a side effect, this change updates the default reader version to 2 and the writer version to 5.
This PR also adds
delta.column-mapping-modeconfig property so users can revert the default value tononeif necessary.All Databricks LTS versions support the name column mapping mode and those reader and writer versions.
Release notes