-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
I think there are two metrics that are useful to find anti-patterns when processing data with Hash Joins:
- probe_hit_rate = fraction of probe rows that found ≥1 match
- avg_fanout = average number of build matches per matched probe row
Example
In datafusion-cli
> set datafusion.explain.analyze_level = summary;
0 row(s) fetched.
Elapsed 0.000 seconds.
> explain analyze select *
from generate_series(10) as t1(a)
join generate_series(20) as t2(b)
on t1.a=t2.b;
Assuming t1 is the build side and t2 is the probe side, around half of the probe side rows will have match in the hash table built with t1, and each match will output exactly 1 output row. So the metric will be calculated as HashJoinExec ...metrics=[...probe_hit_rate=50%, avg_fanout=1...
Describe the solution you'd like
Support probe_hit_rate and avg_fanout in Hash Join executor.
Reference PR: #18406
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request