Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better description of n-gram bloom filter index tuning #3066

Open
rschu1ze opened this issue Jan 10, 2025 · 0 comments
Open

Better description of n-gram bloom filter index tuning #3066

rschu1ze opened this issue Jan 10, 2025 · 0 comments

Comments

@rschu1ze
Copy link
Member

ClickHouse provides different skip / secondary indexes types, for example "N-gram Bloom Filter" indexes. These are documented here:

https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#table_engine-mergetree-data_skipping-indexes

Indexes of this type are highly sensitive to the choice of the index parameters (n, size_of_bloom_filter_in_bytes, number_of_hash_functions). If these constants are off, the index becomes ineffective.

I was helping a customer today tuning their n-gram filter indexes and I found that the documentation of the tuning-process is not "idiot-proof" enough. The current docs mention different UDFs to help calculate the parameters, but then they also mention 4300 as the number of ngrams per granule without explaining how this number can be calculated. (I found this comment in GitHub which helped me with that but it is really not obvious).

Can we please rewrite the entire tuning process in a more user-friendly manner?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant