Skip to content

Hive fails opening split for zst compressed files #173

@benrifkind

Description

@benrifkind

This issue is copied from trinodb/trino#17792 since I believe this repo is where zstd de/compression is handled.

I have a Hive table built on top of zst compressed data. On a Trino 419 cluster I get the following error when trying to read this table from Trino. That version of Trino uses aircompressor 0.23.

Query 20230607_172621_00003_5hpz7 failed: Error opening Hive split s3://path/to/file.csv.access.log.zst (offset=0, length=1544108): Window size too large (not yet supported): offset=3084

We are currently running Trino 405 and this query executes without an issue. We also have been running previous versions of Trino/Presto and this executed without an issue in the past. That version of Trino uses 0.21

Did something change between aircompressor 0.21 and 0.24 that might have caused this? And is there anything I can do to get past this error? Thanks in advance for your help!

Full stack trace

io.trino.spi.TrinoException: Error opening Hive split s3://path/to/file.csv.access.log.zst (offset=0, length=1544108): Window size too large (not yet supported): offset=3084
at io.trino.plugin.hive.line.LinePageSourceFactory.createPageSource(LinePageSourceFactory.java:179)
at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:218)
at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:156)
at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:48)
at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:61)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:298)
at io.trino.operator.Driver.processInternal(Driver.java:402)
at io.trino.operator.Driver.lambda$process$8(Driver.java:305)
at io.trino.operator.Driver.tryWithLock(Driver.java:701)
at io.trino.operator.Driver.process(Driver.java:297)
at io.trino.operator.Driver.processForDuration(Driver.java:268)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:888)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:187)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:556)
at io.trino.$gen.Trino_18f7842____20230607_162211_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: io.airlift.compress.MalformedInputException: Window size too large (not yet supported): offset=3084
at io.airlift.compress.zstd.Util.verify(Util.java:45)
at io.airlift.compress.zstd.ZstdFrameDecompressor.decodeCompressedBlock(ZstdFrameDecompressor.java:303)
at io.airlift.compress.zstd.ZstdIncrementalFrameDecompressor.partialDecompress(ZstdIncrementalFrameDecompressor.java:236)
at io.airlift.compress.zstd.ZstdInputStream.read(ZstdInputStream.java:89)
at io.airlift.compress.zstd.ZstdHadoopInputStream.read(ZstdHadoopInputStream.java:53)
at com.google.common.io.CountingInputStream.read(CountingInputStream.java:64)
at java.base/java.io.InputStream.readNBytes(InputStream.java:506)
at io.trino.hive.formats.line.text.TextLineReader.fillBuffer(TextLineReader.java:248)
at io.trino.hive.formats.line.text.TextLineReader.(TextLineReader.java:67)
at io.trino.hive.formats.line.text.TextLineReaderFactory.createLineReader(TextLineReaderFactory.java:77)
at io.trino.plugin.hive.line.LinePageSourceFactory.createPageSource(LinePageSourceFactory.java:171)
... 17 more

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions