Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add IO task manager to assist in efficient use of KvikIO #17639

Open
GregoryKimball opened this issue Dec 19, 2024 · 0 comments
Open

[FEA] Add IO task manager to assist in efficient use of KvikIO #17639

GregoryKimball opened this issue Dec 19, 2024 · 0 comments
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@GregoryKimball
Copy link
Contributor

GregoryKimball commented Dec 19, 2024

Is your feature request related to a problem? Please describe.
Some parquet writers produce a page size distribution where ~30% of the pages are 64B to 32 KiB, in contrast to the typical distribution where the median is 1-4 MiB. Although we can decompress and decode the smaller pages efficiently, we observe low IO throughput. We speculate that coalescing the <100 KiB read requests into larger requests would improve this performance degradation issue.

Describe the solution you'd like
Add an IO manager layer to the datasources that use KvikIO. If there are several small byte ranges in sequence, we could coalesce them into a single KvikIO request. I suspect we would need to break them up again before beginning downstream processing. Perhaps we should restrict the coalescing only to byte ranges that are contiguous, so that we can split the buffers again without copying the data.

Additional context and alternatives
Previously we disabled KvikIO for small copies, but the IO throughput was especially poor when using UVM due to a large number of prefetches. (also see #17260). We might also be able to do task coalescing within KvikIO.

The approach for managing the coalesce and split will hopefully not trigger allocate, because we would prefer to avoid triggering a "prefetch-on-alloc" for each 100s of bytes or few KiB.

@GregoryKimball GregoryKimball added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. cuIO cuIO issue labels Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

1 participant