Skip to content

Conversation

@LiaCastaneda
Copy link
Contributor

@LiaCastaneda LiaCastaneda commented Jan 22, 2026

Which issue does this PR close?

Rationale for this change

The current v52 signature pub async fn wait_complete(self: &Arc<Self>) (introduced in #19546) is a bit unergonomic. The method requires &Arc<DynamicFilterPhysicalExpr>, but when working with Arc<dyn PhysicalExpr>, downcasting only gives you &DynamicFilterPhysicalExpr. Since you can't convert &DynamicFilterPhysicalExpr to Arc<DynamicFilterPhysicalExpr>, the method becomes impossible to call.

The &Arc<Self> param was used to check is_used() via Arc strong count, but this was overly defensive.

What changes are included in this PR?

  • Changed DynamicFilterPhysicalExpr::wait_complete signature from pub async fn wait_complete(self: &Arc<Self>) to pub async fn wait_complete(&self).

  • Removed the is_used() check from wait_complete() - this method, like wait_update(), should only be called on filters that have consumers. If the caller doesn't know whether the filter has consumers, they should call is_used() first to avoid waiting indefinitely. This approach avoids complex signatures and dependencies between the APIs methods.

Are these changes tested?

Yes, existing tests cover this functionality, I removed the "mock" consumer from test_hash_join_marks_filter_complete_empty_build_side and test_hash_join_marks_filter_complete since the fix in #19734 makes is_used check the outer struct strong_count as well.

Are there any user-facing changes?

The signature of wait_complete changed.

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate labels Jan 22, 2026
Comment on lines 302 to 306
/// # Note
///
/// This method should only be called on filters that have consumers. If you don't
/// know whether the filter is being used, call [`Self::is_used`] first to avoid
/// waiting indefinitely.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bit of nuance here. This is not because of the DynamicFilterPhysicalExpr API itself, it's only because of how HashJoinExec is implemented.

Under normal operation it would be a consumer calling wait_complete and hence it knows that a consumer exists because it is a consumer. In other words, under normal operation wait_completed is only called by consumers and thus is_used would always be true.

Or put another way, the only way this could go wrong is e.g. in a test or if HashJoinExec itself called wait_complete. By definition if more than 1 thing has a reference to the dynamic filter, there is a consumer. If there is only 1 reference, it must be the one HashJoinExec has (outside of tests). So it would have to be HashJoinExec that is calling wait_complete() right?

So the scenario described here seems more like a programming error or misuse of the APIs, not something that could happen under normal operation of a bug free usage of these APIs, right? In other words: if I was implementing this I could probably put the is_used() check behind #[cfg(debug_assertions)] or something to catch a programming error on my end, but it wouldn't really make sense to have that check at runtime in production, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under normal operation it would be a consumer calling wait_complete and hence it knows that a consumer exists because it is a consumer. In other words, under normal operation wait_completed is only called by consumers and thus is_used would always be true.

🤔 I agree -- when I implemented this API, I had in mind that it would be used mainly in custom probe nodes of a HashJoinExec, so yes, ideally wait_complete is always called by consumers (in those cases it's impossible for wait_complete to wait indefinitely).

However, I remember when I added is_used in HashJoinExec::execute(), I saw in the tests we were waiting indefinitely (but this was mainly because before, is_used only checked the inner struct, which is not the case anymore), which is why I then added is_used inside wait_complete.

I included this note mainly thinking about a scenario where some third-party node has a reference to the DynamicFilter of the HashJoin but doesn't know if it has a consumer or not. However, in that case, the third-party node would hold a reference and is_used would return true, then filters would be computed and wait_complete would return successfully.

So yes, if this happens, it would be a programming error. I will remove the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just make it clear that this scenario would only result from a programming error 😄

Comment on lines 282 to 284
/// In the unlikely scenario where this method waits indefinitely, it indicates
/// a programming error where `wait_update()` is being called without any consumers
/// holding a reference to the filter.
Copy link
Contributor

@adriangb adriangb Jan 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say something like:

Producers (e.g.) HashJoinExec will never update the expression or mark it as completed if there are no consumers as an optimization to avoid extra work. If you call this method on a dynamic filter created by such a producer and there are no consumers registered this method would wait indefinitely. This should not happen under normal operation because if you have a reference to this structure that means you are a consumer and hence the producer will update the filter. If you do run into this scenario it would indicate a programming error either in your producer or in DataFusion if the producer is a built in node; please report the bug or open an issue explaining your use case.

@LiaCastaneda LiaCastaneda force-pushed the lia/return-wait-complete-to-old-signature branch from e2d9bfb to abc077a Compare January 22, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants