[FEA] Adjust libcudf to use kvikIO for small host reads #17259

GregoryKimball · 2024-11-06T19:17:37Z

Is your feature request related to a problem? Please describe.

For parquet columns that are highly compressible, the page size can be very small. For example, we observe 50-100 KB pages for the NDS-H column l_shipinstruct in lineitems with ZSTD compression. This column has a cardinality of 4, encodes well with run_length, and (may) show other compression opportunities as well.

>>> df['l_shipinstruct'].unique()
0    DELIVER IN PERSON
1     TAKE BACK RETURN
2                 NONE
3          COLLECT COD

When performing IO on these pages, libcudf falls back to a host read from pageable instead of a pinned read via kvikIO.

For an async memory resource, using host read from pageable is not that different than the kvikIO option. However, when using a managed memory pool resource, host read from pageable appears to be ~2x slower in the IO step.

Describe the solution you'd like
We should consider adjusting libcudf to always use kvikIO instead.

I believe we should consider refactoring datasource to not use a read threshold, and also decouple the gds labels from dispatch to kvikIO for host reads.

static constexpr size_t _gds_read_preferred_threshold = 128 << 10;  // 128KB

Additional context

We've collected evidence that with a managed memory pool, the HtoD host copy from pageable to managed is slower than the copy from pageable to device, at least for these small pages.

The text was updated successfully, but these errors were encountered:

GregoryKimball · 2024-11-06T19:19:20Z

@brandon-b-miller @vuule thank you for the discussion today about the pageable copy difference with a managed pool MR.

vuule · 2024-11-06T19:54:48Z

Related issue #17228: use kvikIO for host reads.
Most likely not required to avoid the pageable copies in this case.

vuule · 2024-11-06T20:16:39Z

Opened #17260 with a potential fix.
Haven't evaluated the performance impact.

GregoryKimball added cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. labels Nov 6, 2024

GregoryKimball assigned kingcrimsontianyu Nov 6, 2024

vuule mentioned this issue Nov 6, 2024

Always prefer device_reads when kvikio is enabled #17260

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Adjust libcudf to use kvikIO for small host reads #17259

[FEA] Adjust libcudf to use kvikIO for small host reads #17259

GregoryKimball commented Nov 6, 2024 •

edited

Loading

GregoryKimball commented Nov 6, 2024

vuule commented Nov 6, 2024

vuule commented Nov 6, 2024

[FEA] Adjust libcudf to use kvikIO for small host reads #17259

[FEA] Adjust libcudf to use kvikIO for small host reads #17259

Comments

GregoryKimball commented Nov 6, 2024 • edited Loading

GregoryKimball commented Nov 6, 2024

vuule commented Nov 6, 2024

vuule commented Nov 6, 2024

GregoryKimball commented Nov 6, 2024 •

edited

Loading