Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Refactor Table Metadata Tests #11947

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions core/src/main/java/org/apache/iceberg/TableMetadata.java
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ public class TableMetadata implements Serializable {
static final int INITIAL_SPEC_ID = 0;
static final int INITIAL_SORT_ORDER_ID = 1;
static final int INITIAL_SCHEMA_ID = 0;
static final int BRANCHING_MIN_SUPPORT_VERSION = 2;
static final int UUID_REQUIRED_MIN_VERSION = 2;

private static final long ONE_MINUTE = TimeUnit.MINUTES.toMillis(1);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,7 @@ public static TableMetadata fromJson(String metadataLocation, JsonNode node) {
ImmutableList.Builder<Schema> builder = ImmutableList.builder();
for (JsonNode schemaNode : schemaArray) {
Schema current = SchemaParser.fromJson(schemaNode);
Schema.checkCompatibility(current, formatVersion);
HonahX marked this conversation as resolved.
Show resolved Hide resolved
if (current.schemaId() == currentSchemaId) {
schema = current;
}
Expand All @@ -372,6 +373,7 @@ public static TableMetadata fromJson(String metadataLocation, JsonNode node) {
formatVersion == 1, "%s must exist in format v%s", SCHEMAS, formatVersion);

schema = SchemaParser.fromJson(JsonUtil.get(SCHEMA, node));
Schema.checkCompatibility(schema, formatVersion);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HonahX, when I introduced this compatibility check, I purposely called it at write time rather than at read time. I've also pushed back on strictness in the read path like this in the REST catalog and other places. The reason for it is that we don't want to immediately reject metadata on read because that prevents the library from being used to debug or fix metadata issues.

For example, this check validates that initial default values are not used in v2 tables. But what if a table is created by another library that way? Checking compatibility here prevents loading the table at all, let alone using this library to drop the problematic column from the schema using SQL DDL or direct calls. Failing to load the table has much broader consequences, like failing existence checks because tableExists calls loadTable, failing to run information_schema queries that are unrelated, or failing to run expireSnapshots and remove old data -- which can cause compliance problems.

I think that a better time to fail is when the actual problem caused by the compatibility issue happens. For instance, if there is a default that can't be applied, then the library should fail to read from the table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! So in general we want to be less restrictive on read side to open the opportunities of fix things instead of rejecting all the errors.

I will revert this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if this is the case we should have some tests that show that parsing invalid metadata is a behavior is allowed by the library. Parsing some invalid Json should not throw an exception for compatibility purposes? I think we could just take a fully populated V3 Metadata and change it's format version to 1 or something. This should be readable (but not really usable)? I'm not sure what other cases we would want, but I think we'd be in a better state if we had tests for behaviors we want to keep in the code.

currentSchemaId = schema.schemaId();
schemas = ImmutableList.of(schema);
}
Expand Down
Loading