Skip to content

[Go] Random segmentation faults when calling Read() on a pqarrow.RecordReader #29

@reiades

Description

@reiades

Hello!

I am currently using github.com/apache/arrow/go/v16/parquet to read the records of a downloaded s3 parquet file (75KB, stored in bytes.Buffer). My implementation is the following:

mem := memory.NewCheckedAllocator(memory.DefaultAllocator)
pf, err := file.NewParquetReader(bytes.NewReader(buf.Bytes()), file.WithReadProps(parquet.NewReaderProperties(mem)))
if err != nil {
     return nil, err
}
defer pf.Close()
reader, err := pqarrow.NewFileReader(pf, pqarrow.ArrowReadProperties{Parallel: true, BatchSize: pf.NumRows()}, mem)
if err != nil {
     return nil, err
}
rr, err := reader.GetRecordReader(ctx, nil, nil)
if err != nil {
     return nil, err
}
defer rr.Release()
rec, err = rr.Read() <---- problem line
if err != nil && err != io.EOF {
     return nil, err
}
if rec == nil {
     return nil, nil
}
defer rec.Release()

... parse the file 

I am reading the same file each time and majority of the reads into rec are successful. However, on occasion, I get a segmentation fault inside of rr.Read(). I have confirmed that the file is successfully downloaded each time and that buf.Bytes() is the same on successful and failed reads. I have also confirmed that I can get the schema from the file on successful and failed reads which leads me more to believe something is happening inside the RecordReader.

schema := pf.MetaData().Schema
log.Info(fmt.Sprintf("Schema:%s", schema)) <--- prints out the right schema each time

Here are some logs from the stack trace that I thought could be helpful for debugging.

SIGSEGV: segmentation violation
PC=0x4cb0c8 m=11 sigcode=1 addr=0x7ffbfdf94013e8

goroutine 150888 gp=0x4006db0a80 m=11 mp=0x4000780808 [runnable]:
github.com/apache/arrow/go/v16/parquet/internal/bmi.extractBitsGo(0xffffffffffffffff?, 0xffffffffffffffff?)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/internal/bmi/bmi.go:242 +0xcc fp=0x41bc72bae0 sp=0x41bc72bae0 pc=0x12818ac
github.com/apache/arrow/go/v16/parquet/internal/bmi.ExtractBits(...)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/internal/bmi/bmi.go:38
github.com/apache/arrow/go/v16/parquet/file.defLevelsBatchToBitmap({0x45f221c000?, 0x1?, 0x1?}, 0x400, {0xbc72bbb8?, 0x41?, 0x0?, 0x874c?}, {0x3b0f7d0, 0x41bcba5cc0}, ...)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/level_conversion.go:155 +0x180 fp=0x41bc72bb70 sp=0x41bc72bae0 pc=0x12f2ad0
github.com/apache/arrow/go/v16/parquet/file.defLevelsToBitmapInternal({0x45f221c000, 0x400, 0x2c000}, {0x1?, 0x0?, 0x0?, 0x1?}, 0x41bc72bcc0, 0x1)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/level_conversion.go:175 +0x198 fp=0x41bc72bc40 sp=0x41bc72bb70 pc=0x12f2d68
github.com/apache/arrow/go/v16/parquet/file.DefLevelsToBitmap(...)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/level_conversion.go:186
github.com/apache/arrow/go/v16/parquet/file.(*recordReader).ReadRecordData(0x41bb5c8000, 0x11)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/record_reader.go:545 +0x218 fp=0x41bc72bd40 sp=0x41bc72bc40 pc=0x12f8aa8
github.com/apache/arrow/go/v16/parquet/file.(*recordReader).ReadRecords(0x41bb5c8000, 0xce)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/file/record_reader.go:632 +0x294 fp=0x41bc72bde0 sp=0x41bc72bd40 pc=0x12f8e84
github.com/apache/arrow/go/v16/parquet/pqarrow.(*leafReader).LoadBatch(0x41bb5c8060, 0xce)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/column_readers.go:104 +0xd8 fp=0x41bc72be30 sp=0x41bc72bde0 pc=0x1767e48
github.com/apache/arrow/go/v16/parquet/pqarrow.(*listReader).LoadBatch(0x41bc72bee8?, 0x41bc72bf3c?)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/column_readers.go:360 +0x2c fp=0x41bc72be50 sp=0x41bc72be30 pc=0x17690fc
github.com/apache/arrow/go/v16/parquet/pqarrow.(*ColumnReader).NextBatch(0x41b9013190, 0xce)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:131 +0x34 fp=0x41bc72be70 sp=0x41bc72be50 pc=0x176e9d4
github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next.func1(0x5, 0x41bc72bf38?)
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:655 +0x50 fp=0x41bc72bef0 sp=0x41bc72be70 pc=0x17729a0
github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next.func2()
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:708 +0x100 fp=0x41bc72bfd0 sp=0x41bc72bef0 pc=0x1772850
runtime.goexit({})
	/root/.gimme/versions/go1.22.5.linux.arm64/src/runtime/asm_arm64.s:1222 +0x4 fp=0x41bc72bfd0 sp=0x41bc72bfd0 pc=0x4df0a4
created by github.com/apache/arrow/go/v16/parquet/pqarrow.(*recordReader).next in goroutine 253
	/go/pkg/mod/github.com/apache/arrow/go/[email protected]/parquet/pqarrow/file_reader.go:699 +0x2e8
...

It seems that the segmentation fault is happening inside of (*recordReader).next so was curious if anyone familiar with this library had some insight on why this was happening. I can share a longer stack trace if that would be helpful. I am also using v16 but saw the same error in v13 as well. Thanks in advance!

Component(s)

Go

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions