-
Notifications
You must be signed in to change notification settings - Fork 276
Metadata entries
table breaks when the table configured as Merge-on-Read and has Delete Files
#1884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@kevinjqliu Please let me know if we need to add more details that needs to be added in here. I further looked into the code to fix the issue and it seems to be a simple fix in In addition to |
#1902) <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> Closes #1884 # Rationale for this change table.inspect.entries() fails when table is MOR table and has Delete Files present in it. Iceberg MOR Table is created via Apache Spark 3.5.0 with Iceberg 1.5.0 and it's being read via PyIceberg 0.9.0 using StaticTable.from_metadata() # Are these changes tested? Yes # Are there any user-facing changes? No <!-- In the case of user-facing changes, please add the changelog label. -->
#1902) <!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> Closes #1884 # Rationale for this change table.inspect.entries() fails when table is MOR table and has Delete Files present in it. Iceberg MOR Table is created via Apache Spark 3.5.0 with Iceberg 1.5.0 and it's being read via PyIceberg 0.9.0 using StaticTable.from_metadata() # Are these changes tested? Yes # Are there any user-facing changes? No <!-- In the case of user-facing changes, please add the changelog label. -->
Apache Iceberg version
0.9.0 (latest release)
Please describe the bug 🐞
Issue:
table.inspect.entries()
fails when table is MOR table and has Delete Files present in it. Iceberg MOR Table is created via Apache Spark 3.5.0 with Iceberg 1.5.0 and it's being read via PyIceberg 0.9.0 usingStaticTable.from_metadata()
.Stacktrace:
Replication
This issue can be replicated by following the instructions below:
Spark 3.5.0
withIceberg 1.5.0
UPDATE
statement to generate a Delete FileReading Spark created table from PyIceberg
Issue found after debugging
I did some debugging and figured out the
inspect.entries()
break for MOR tables while reading the*-delete.parquet
files present in table.While reading the Delete file,
value_counts
is coming as null. I can see thatManifestEntryStatus
isADDED
andDataFile
content isDataFileContent.POSITION_DELETES
which seems to be correct.I further looked into the
manifest.avro
file that holds the entry for delete parquet files. And well,value_counts
populated there itself isNULL
. That's the reasonentry.data_file.value_counts
is coming asnull
.value_counts
asnull
can also be seen in above in the output of query ofdelete_files
table.Willingness to contribute
The text was updated successfully, but these errors were encountered: