Replies: 2 comments 2 replies
-
|
Hey @rjzamora! My guess here is that the root cause of this is: Here's a fix I just put together: #2780 Feel free to try this instead, I'd be curious to see if this performs better for you and this is essentially the approach that the fix I proposed above takes: df = daft.read_parquet("s3://...")
arrow_rb_iters = df.to_arrow_iter()
table = pa.Table.from_batches(arrow_rb_iters) |
Beta Was this translation helpful? Give feedback.
-
|
I also ran this through the rest of the team, and we think that it also might be the case that Some other notes for benchmarking that you might find helpful:
Generally speaking, if the goal here is to retrieve data and pipe it into CuDF, I'm guessing that |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Daft experts,
Congrats on all the great work you are doing with Daft - Cool stuff!
I have a pretty basic question about the expected performance of converting a Daft DataFrame to an Arrow Table. Hopefully this is the right place to ask.
Background: I was curious if I could use Daft to reduce the IO bottleneck of a cuDF workflow that needs to read multiple parquet files from S3 into GPU memory. Since both Daft and cuDF offer simple APIs to convert data to/from Arrow, I started by benchmarking the performance of
daft.read_parquet(paths).to_arrow(). While running on a simple p3.2xlarge instance, I found the performance ofdf.collect()to be at least 3x faster thandf.to_arrow()(~50MBps vs ~150MBps). When I scale out using multiple processes, the performance delta was even larger (1-2Gbps vs 10+Gbps).Question: Is
df.to_arrow()expected to have a large overhead compared todf.collect()? What makes thecollectoperation so much faster (sometimes faster than the advertised network bandwidth)?I realize Daft is not intended to be used it the way I am using it, so it's perfectly fine if your answer reflects that reality :)
Beta Was this translation helpful? Give feedback.
All reactions