Skip to content

Error reading parquet files - getting "can not read class org.apache.parquet.format.PageHeader: don't know what type: 15" #25821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
chennurchaitanya opened this issue May 19, 2025 · 1 comment

Comments

@chennurchaitanya
Copy link

When we run trino query on large iceberg table, getting below exception. Out of 10 times, this query is failing 2 times, not sure whats going on. Spark Writer is using parquet v2 version to manifest the files. some time we get "Required field 'uncompressed_page_size' was not found" internally with "don't know what type" with different number (example 15 here, 14, 13, 0 etc).

trino version we are using is 459 & 443, i see the issue in both versions.

Error executing query:
SQL Error [84148230]: Query failed (#20250519_203712_17592_5z4vb): Error opening Iceberg split s3a://XXXXXXXXXXXX/00061-21971-4993a885-86d8-4ac4-8ced-6da392e74ad2-00001.parquet (offset=480065620, length=37069344): java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15
io.cloudbeaver.DBWebException: Error executing query:
SQL Error [84148230]: Query failed (#20250519_203712_17592_5z4vb): Error opening Iceberg split s3a://XXXXXXXXXXX/00061-21971-4993a885-86d8-4ac4-8ced-6da392e74ad2-00001.parquet (offset=480065620, length=37069344): java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15
at io.cloudbeaver.service.sql.WebSQLProcessor.processQuery(WebSQLProcessor.java:266)
at io.cloudbeaver.service.sql.impl.WebServiceSQL$1.run(WebServiceSQL.java:411)
at io.cloudbeaver.model.session.WebSession$1.run(WebSession.java:741)
at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:117)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Caused by: org.jkiss.dbeaver.model.exec.DBCException: SQL Error [84148230]: Query failed (#20250519_203712_17592_5z4vb): Error opening Iceberg split s3a://XXXXXXX/00061-21971-4993a885-86d8-4ac4-8ced-6da392e74ad2-00001.parquet (offset=480065620, length=37069344): java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.nextRow(JDBCResultSetImpl.java:185)
at io.cloudbeaver.service.sql.WebSQLProcessor.readResultSet(WebSQLProcessor.java:1032)
at io.cloudbeaver.service.sql.WebSQLProcessor.fillQueryResults(WebSQLProcessor.java:990)
at io.cloudbeaver.service.sql.WebSQLProcessor.lambda$1(WebSQLProcessor.java:256)
at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:192)
at io.cloudbeaver.service.sql.WebSQLProcessor.processQuery(WebSQLProcessor.java:207)
... 4 more
Caused by: java.sql.SQLException: Query failed (#20250519_203712_17592_5z4vb): Error opening Iceberg split s3a://XXXXXXXXXXXX/00061-21971-4993a885-86d8-4ac4-8ced-6da392e74ad2-00001.parquet (offset=480065620, length=37069344): java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15
at io.trino.jdbc.AbstractTrinoResultSet.resultsException(AbstractTrinoResultSet.java:1937)
at io.trino.jdbc.TrinoResultSet$ResultsPageIterator.computeNext(TrinoResultSet.java:294)
at io.trino.jdbc.TrinoResultSet$ResultsPageIterator.computeNext(TrinoResultSet.java:254)
at io.trino.jdbc.$internal.guava.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
at io.trino.jdbc.$internal.guava.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1855)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:292)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:298)
at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
at io.trino.jdbc.TrinoResultSet$AsyncIterator.lambda$new$1(TrinoResultSet.java:179)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: io.trino.spi.TrinoException: Error opening Iceberg split s3a://siem-universal/universal/siem_bigipasm/data/event_ingested_day=2025-03-12/00061-21971-4993a885-86d8-4ac4-8ced-6da392e74ad2-00001.parquet (offset=480065620, length=37069344): java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15
at io.trino.plugin.iceberg.IcebergPageSourceProvider.createParquetPageSource(IcebergPageSourceProvider.java:1052)
at io.trino.plugin.iceberg.IcebergPageSourceProvider.createDataPageSource(IcebergPageSourceProvider.java:553)
at io.trino.plugin.iceberg.IcebergPageSourceProvider.createPageSource(IcebergPageSourceProvider.java:360)
at io.trino.plugin.iceberg.IcebergPageSourceProvider.createPageSource(IcebergPageSourceProvider.java:250)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:48)
at io.trino.split.PageSourceManager$PageSourceProviderInstance.createPageSource(PageSourceManager.java:79)
at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:261)
at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.process(ScanFilterAndProjectOperator.java:192)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:359)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:240)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
at io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:255)
at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
at io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133)
at io.trino.operator.Driver.processInternal(Driver.java:403)
at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
at io.trino.operator.Driver.tryWithLock(Driver.java:709)
at io.trino.operator.Driver.process(Driver.java:298)
at io.trino.operator.Driver.processForDuration(Driver.java:269)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:201)
at io.trino.$gen.Trino_459____20250516_053948_2.run(Unknown Source)
at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:202)
at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:172)
at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:159)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1575)
Caused by: java.io.UncheckedIOException: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15
at io.trino.parquet.predicate.PredicateUtils.readPageHeaderWithData(PredicateUtils.java:332)
at io.trino.parquet.predicate.PredicateUtils.readDictionaryPage(PredicateUtils.java:308)
at io.trino.parquet.predicate.PredicateUtils.dictionaryPredicatesMatch(PredicateUtils.java:273)
at io.trino.parquet.predicate.PredicateUtils.predicateMatches(PredicateUtils.java:173)
at io.trino.parquet.predicate.PredicateUtils.getFilteredRowGroups(PredicateUtils.java:207)
at io.trino.plugin.iceberg.IcebergPageSourceProvider.createParquetPageSource(IcebergPageSourceProvider.java:940)
... 39 more
Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: don't know what type: 15
at org.apache.parquet.format.Util.read(Util.java:393)
at org.apache.parquet.format.Util.readPageHeader(Util.java:133)
at org.apache.parquet.format.Util.readPageHeader(Util.java:128)
at io.trino.parquet.predicate.PredicateUtils.readPageHeaderWithData(PredicateUtils.java:329)
... 44 more
Caused by: shaded.parquet.org.apache.thrift.protocol.TProtocolException: don't know what type: 15
at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.getTType(TCompactProtocol.java:900)
at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:551)
at org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:188)
at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1003)
at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:995)
at org.apache.parquet.format.PageHeader.read(PageHeader.java:870)
at org.apache.parquet.format.Util.read(Util.java:390)
... 47 more

@XavieLee
Copy link
Member

Hi @chennurchaitanya, In my onion, it should be parquet file issue, I met this issue before and fixed it after re-write the file. Just FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants