Comparing read times between dense and sparse parquet files #4258
mikeprince4
started this conversation in
General
Replies: 1 comment 7 replies
-
|
I would expect reading data from a sparse table while filtering out NULL values to be much faster due to predicate pushdown. Hopefully, the Daft team can provide some suggestions. |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I wrote a test to compare the time to read a dense vs sparse parquet file. I was expecting the times to be very similar, but was surprised when reading the sparse columns took much longer (despite filtering out the nulls). This is not what I was expecting, as I had been told that daft would be able to perform this type of operation efficiently. I'm curious if this behavior is in fact expected, or perhaps I am doing something wrong?
The code and results table are below
Thanks in advance
Here are the results of the experiment with
iterations=100Beta Was this translation helpful? Give feedback.
All reactions