Skip to content

[Parquet] Prototype FSST encoding #8749

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The parquet community is considering adding new encodings to the format.

One proposed encoding is:

Describe the solution you'd like

I would like someone(s) to create a branch of arrow-rs 's parquet reader and implement this proposed encoding in the context of parquet to see how it would work.

This would involve likely

  1. Adding a new encoding to the metadata: source link
  2. Add the appropriate encoder/decoders in parquet (existing code)
  3. Add some basic tests showing data could be round tripped with these encodings: examples of similar tets here
  4. Add some benchmarks: examples of similar benchmarks

There are already several Rust implementations of FSST (both Lance and Vortex) that could probably adapted

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions