Skip to content

Improve performance of InternalParquetRecordReader (1%) #3226

@jerolba

Description

@jerolba

Describe the enhancement requested

Profiling the load of a Parquet file with Java Mission Control, I've noticed that InternalParquetRecordReader LongStream consumes relevant amount of time.

This LongStream can be replaced with a simpler Long Iterator that iterates from 0 to pages.getRowCount().

To measure the overhead I've created a test project that overwrites InternalParquetRecordReader implementation with a Long Iterator: https://github.com/jerolba/parquet-rowindexiterator

The execution time is sensitive to the context of the JVM, but running the benchmark multiple times shows that LongStream is slower than LongIterator, between 1% and 4% depending on the run.

Component(s)

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions