Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add DELTA_BINARY_PACKED decoding support to Parquet reader #13637

Merged
merged 85 commits into from
Aug 23, 2023

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Jun 29, 2023

Description

Part of #13501. This adds support for decoding Parquet pages that are DELTA_BINARY_PACKED.

In addition to adding delta support, this PR incorporates changes introduced in #13622, such as using a mask to determine which decoding kernels to run, and adding parameters to the page_state_buffers_s struct to reduce the amount of shared memory used.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rapids-bot
Copy link

rapids-bot bot commented Jun 29, 2023

Pull requests from external contributors require approval from a rapidsai organization member with write permissions or greater before CI can begin.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Jun 29, 2023
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jun 29, 2023
@vuule vuule added feature request New feature or request cuIO cuIO issue non-breaking Non-breaking change labels Jun 30, 2023
@vuule
Copy link
Contributor

vuule commented Jun 30, 2023

/ok to test

Copy link
Contributor

@nvdbaranec nvdbaranec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick first pass. More to come.

cpp/src/io/parquet/rle_stream.cuh Outdated Show resolved Hide resolved
cpp/src/io/parquet/rle_stream.cuh Show resolved Hide resolved
cpp/src/io/parquet/parquet_gpu.hpp Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_string_utils.cuh Outdated Show resolved Hide resolved
cpp/src/io/parquet/decode_preprocess.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/parquet_gpu.hpp Show resolved Hide resolved
@etseidl etseidl mentioned this pull request Aug 18, 2023
3 tasks
cpp/src/io/parquet/page_string_utils.cuh Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_string_utils.cuh Outdated Show resolved Hide resolved
zhuoxunyi referenced this pull request Aug 23, 2023
Fixes: #13864 

This PR fixes an issue with `loc` indexer where some special handling needs to be done when `columns` is of type `MultiIndex`.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #13929
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake approval.

Copy link
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with a suggestion.

@vuule
Copy link
Contributor

vuule commented Aug 23, 2023

/ok to test

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Aug 23, 2023
@vuule
Copy link
Contributor

vuule commented Aug 23, 2023

/merge

@rapids-bot rapids-bot bot merged commit c39c04d into rapidsai:branch-23.10 Aug 23, 2023
54 checks passed
@etseidl etseidl deleted the feature/delta_binary branch August 26, 2023 18:22
rapids-bot bot pushed a commit that referenced this pull request Sep 13, 2023
#13637 added a static stream pool object for use by the Parquet reader. This PR expands upon that by:

- Moving the stream pool to the `cudf::detail` namespace.
- Adding a debugging implementation that always returns the default stream.
- Hiding implementation details behind a more streamlined interface.
- Using cuda events for synchronization.

Authors:
  - Ed Seidl (https://github.com/etseidl)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Mark Harris (https://github.com/harrism)

URL: #13922
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge CMake CMake build issue cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants