Replies: 1 comment
-
|
One idea that could work is Essentially we concat the values into a list and then use list expressions to get the last element. Also apologies for the late reply, we missed this. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey everyone,
I'm working with a 100GB+ timeseries dataset where records come in every few milliseconds, and I'm trying to downsample it into 5-minute bins to reduce the data size. My approach was to create a new column like this:
So far, so good. The next logical step for me was to group by
time_binand get the last value of a column within each 5-minute bin, something like:However, I couldn't find any built-in
last(),first(), or evenarg_max("other_column")aggregation methods. Is there an existing way to achieve this in Daft?I’d rather not use
.mean(), since I just need a snapshot of the last recorded value in each bin. Did I miss something obvious?I'm new to Daft, but it has given me a great first impression! I'd love to understand if there’s a recommended approach for this, or if there’s a reason these aggregation methods aren’t implemented (maybe due to distributed processing constraints?).
Thanks in advance! 🙏
Beta Was this translation helpful? Give feedback.
All reactions