You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The JSON reader in libcudf supports multi-source reading of GZIP-compressed JSONL files, using host-side decompression algorithms.
However, the performance is limited to about 100 MB/s due to a single-host thread completing the decompression in sequence (see discussion in 17219).
Describe the solution you'd like
We should add a multi-threaded implementation to process GZIP decompression, with one host thread per source. Each source is a single compression block.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
The JSON reader in libcudf supports multi-source reading of GZIP-compressed JSONL files, using host-side decompression algorithms.
However, the performance is limited to about 100 MB/s due to a single-host thread completing the decompression in sequence (see discussion in 17219).
Describe the solution you'd like
We should add a multi-threaded implementation to process GZIP decompression, with one host thread per source. Each source is a single compression block.
The text was updated successfully, but these errors were encountered: