-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Simplify wait_complete function #19937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Simplify wait_complete function #19937
Conversation
| /// # Note | ||
| /// | ||
| /// This method should only be called on filters that have consumers. If you don't | ||
| /// know whether the filter is being used, call [`Self::is_used`] first to avoid | ||
| /// waiting indefinitely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a bit of nuance here. This is not because of the DynamicFilterPhysicalExpr API itself, it's only because of how HashJoinExec is implemented.
Under normal operation it would be a consumer calling wait_complete and hence it knows that a consumer exists because it is a consumer. In other words, under normal operation wait_completed is only called by consumers and thus is_used would always be true.
Or put another way, the only way this could go wrong is e.g. in a test or if HashJoinExec itself called wait_complete. By definition if more than 1 thing has a reference to the dynamic filter, there is a consumer. If there is only 1 reference, it must be the one HashJoinExec has (outside of tests). So it would have to be HashJoinExec that is calling wait_complete() right?
So the scenario described here seems more like a programming error or misuse of the APIs, not something that could happen under normal operation of a bug free usage of these APIs, right? In other words: if I was implementing this I could probably put the is_used() check behind #[cfg(debug_assertions)] or something to catch a programming error on my end, but it wouldn't really make sense to have that check at runtime in production, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under normal operation it would be a consumer calling wait_complete and hence it knows that a consumer exists because it is a consumer. In other words, under normal operation wait_completed is only called by consumers and thus is_used would always be true.
🤔 I agree -- when I implemented this API, I had in mind that it would be used mainly in custom probe nodes of a HashJoinExec, so yes, ideally wait_complete is always called by consumers (in those cases it's impossible for wait_complete to wait indefinitely).
However, I remember when I added is_used in HashJoinExec::execute(), I saw in the tests we were waiting indefinitely (but this was mainly because before, is_used only checked the inner struct, which is not the case anymore), which is why I then added is_used inside wait_complete.
I included this note mainly thinking about a scenario where some third-party node has a reference to the DynamicFilter of the HashJoin but doesn't know if it has a consumer or not. However, in that case, the third-party node would hold a reference and is_used would return true, then filters would be computed and wait_complete would return successfully.
So yes, if this happens, it would be a programming error. I will remove the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just make it clear that this scenario would only result from a programming error 😄
| /// In the unlikely scenario where this method waits indefinitely, it indicates | ||
| /// a programming error where `wait_update()` is being called without any consumers | ||
| /// holding a reference to the filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say something like:
Producers (e.g.) HashJoinExec will never update the expression or mark it as completed if there are no consumers as an optimization to avoid extra work. If you call this method on a dynamic filter created by such a producer and there are no consumers registered this method would wait indefinitely. This should not happen under normal operation because if you have a reference to this structure that means you are a consumer and hence the producer will update the filter. If you do run into this scenario it would indicate a programming error either in your producer or in DataFusion if the producer is a built in node; please report the bug or open an issue explaining your use case.
e2d9bfb to
abc077a
Compare
Which issue does this PR close?
Rationale for this change
The current v52 signature
pub async fn wait_complete(self: &Arc<Self>)(introduced in #19546) is a bit unergonomic. The method requires&Arc<DynamicFilterPhysicalExpr>, but when working withArc<dyn PhysicalExpr>, downcasting only gives you&DynamicFilterPhysicalExpr. Since you can't convert&DynamicFilterPhysicalExprtoArc<DynamicFilterPhysicalExpr>, the method becomes impossible to call.The
&Arc<Self>param was used to checkis_used()via Arc strong count, but this was overly defensive.What changes are included in this PR?
Changed
DynamicFilterPhysicalExpr::wait_completesignature frompub async fn wait_complete(self: &Arc<Self>)topub async fn wait_complete(&self).Removed the
is_used()check fromwait_complete()- this method, likewait_update(), should only be called on filters that have consumers. If the caller doesn't know whether the filter has consumers, they should callis_used()first to avoid waiting indefinitely. This approach avoids complex signatures and dependencies between the APIs methods.Are these changes tested?
Yes, existing tests cover this functionality, I removed the "mock" consumer from
test_hash_join_marks_filter_complete_empty_build_sideandtest_hash_join_marks_filter_completesince the fix in #19734 makes is_used check the outer structstrong_countas well.Are there any user-facing changes?
The signature of
wait_completechanged.