Skip to content
This repository was archived by the owner on Jan 3, 2023. It is now read-only.
This repository was archived by the owner on Jan 3, 2023. It is now read-only.

Hive on MR query with COMPACT file , throw EOFE "Cannot seek to negative offset" #2259

Open
@MyqueWooMiddo

Description

@MyqueWooMiddo

I generate a small table 'small_table' with several INSERTs , so there're many small files in HDFS

I create a SSM compact rule on 'small_table' than run action , then the _container_file was produced on the same HDFS path.

I run 'hadoop fs -cat /small_table/xx.txt ' , I can view its content .
In the mean time , I can view xx.txt and _container_file open & getXAttrs operation in hdfs-audit.log .

In Hive , I can query the small_table with filter (without MR) , but I can't run aggregate such an count(*) on it (with MR) .
It will throw the stack :

org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:271)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:217)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:345)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:719)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:176)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
... 11 more
Caused by: java.io.EOFException: Cannot seek to negative offset
at org.apache.hadoop.hdfs.CompactInputStream.seek(CompactInputStream.java:135)
at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:71)
at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:140)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:99)
... 16 more

My Hive version is 3.1.x and Hadoop version is 3.3.x

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions