Skip to content

Conversation

@colinmarc
Copy link
Contributor

The spec mentions this naming convention here:

https://iceberg.apache.org/spec/#naming-for-gzip-compressed-metadata-json-files

Which issue does this PR close?

What changes are included in this PR?

Support for reading compressed metadata.

Are these changes tested?

Yes.

@colinmarc colinmarc force-pushed the metadata-compressed branch 2 times, most recently from 654de6b to cd16381 Compare October 29, 2025 21:26
let metadata_content = input_file.read().await?;
let metadata = serde_json::from_slice::<TableMetadata>(&metadata_content)?;

let metadata = if metadata_location.as_ref().ends_with(".gz.metadata.json") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to optionally support the Java Iceberg alternative?

The Java reference implementation can additionally read GZIP compressed files with the suffix metadata.json.gz.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems better to have one convention, to me, but happy either way.

Even better would be peeking at the file and looking for the gzip magic number. If there's interest in that I can implement it. The wording of the spec ("some implementations require") seems to suggest it would be better to have no naming requirement at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better would be peeking at the file and looking for the gzip magic number. If there's interest in that I can implement it.

That would be a really elegant solution, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done!

@colinmarc colinmarc force-pushed the metadata-compressed branch from cd16381 to 9892bae Compare October 30, 2025 07:36
The spec mentions that metadata files "may be compressed with GZIP",
here:

    https://iceberg.apache.org/spec/#table-metadata-and-snapshots
@colinmarc colinmarc force-pushed the metadata-compressed branch from 9892bae to 011512a Compare October 30, 2025 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FR: support compressed metadata

2 participants