Droping duplicates based on single column #4263
Replies: 1 comment 2 replies
-
AFAIK your options are to use GROUP BY or a WINDOW function. Think of each set of rows which share column "time" as a "group" — then you will need to select one of the rows from this group. In your case, you've used a window function to get the first one and ordered by time, that's how the engine chose a row out of this grouping. If you use a GROUP BY, you will not have the remaining values like you do in the window. You must tell the engine which row within the group to return. You could do this with aggregation functions like min/max albeit it's a bit strange. SELECT time, agg(col1), agg(col2), ... FROM T GROUP BY time This would only work if your additional columns make sense with aggregation functions, otherwise WINDOW function is the way to do this. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I am trying to drop duplicates, not based on the entire row matching, but based only on the"time" column. I was able to get this to work with daft 4.11, by using a window function.
Is there a way to get around having to
order_by("time")
? It's unnecessary for our purpose, but the window function won't work otherwise.Or perhaps there's some entirely other, better way to do this?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions