Skip to content

feat: Reduce HashTable load factor from 0.875 to 0.7 #13694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

arhimondr
Copy link
Contributor

Summary:
Load factor of 7/8 is too high. When table is 7/8 full a significant performance
degradation is observed. Highly selective use cases (when value is not
present in the hash table) are impacted the most.

According to the simulation when table is 7/8 full a single probe operation has
to visit 2.88 buckets on average.

When double hashing algorithm is used the average number of bucket to visit is
reduced to 1.88 with the idential load.

The optimal load factor seems to be at 0.7. Past that point performance starts
to deteriorate fast. Performance deteriorates much faster for linear probing
algorithm.

Differential Revision: D76267085

Copy link

netlify bot commented Jun 9, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit c674404
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/6849e12be622be0008a00f55

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 9, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76267085

@arhimondr
Copy link
Contributor Author

Discovered when running a modified TPC-H Q4 query over a sf1000 schema on a single node cluster:

SELECT 
  o_orderpriority, 
  COUNT(*) AS order_count 
FROM 
  lineitem, 
  orders 
WHERE 
  o_orderdate >= CAST('1993-07-01' AS DATE)
  AND o_orderdate < CAST('1993-10-01' AS DATE)
  AND l_orderkey = o_orderkey 
  AND l_commitdate < l_receiptdate 
GROUP BY 
  o_orderpriority 
ORDER BY 
  o_orderpriority

Join CPU:

Load factor 7/8: 504.42s
Load factor 1/2: 183.23s
Load factor 0.7: 215.4s

@arhimondr
Copy link
Contributor Author

…or#13694)

Summary:
Pull Request resolved: facebookincubator#13694

Load factor of 7/8 is too high. When table is 7/8 full a significant performance
degradation is observed. Highly selective use cases (when value is not
present in the hash table) are impacted the most.

According to the simulation when table is 7/8 full a single probe operation has
to visit 2.88 buckets on average.

When double hashing algorithm is used the average number of bucket to visit is
reduced to 1.88 with the idential load.

The optimal load factor seems to be at 0.7. Past that point performance starts
to deteriorate fast. Performance deteriorates much faster for linear probing
algorithm.

Reviewed By: Yuhta

Differential Revision: D76267085
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76267085

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants