Replies: 3 comments 2 replies
-
cc @git-hulk @zncleon @PragmaTwice Would you mind take a look? |
Beta Was this translation helpful? Give feedback.
2 replies
-
I also prefer solution 2. Since it‘s not so good to put BFMedata in which column family and so many |
Beta Was this translation helpful? Give feedback.
0 replies
-
Close as we choose "2" here. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
To be briefly, we're plan to support RedisBloom[1]. We plan to use
BlockSplitBloomFilter
as the underlying BloomFilter[2], because it's widely used by database by database systems like RocksDB[3], Impala, Kudu and other systems, it was merged in kvrocks.Now we plan to support RedisBloom, the original redis bloom has two-level:
We can easily support this, however, we need to design our metadata carefully.
Draft1
Currently, the draft is:
For
ChainMetadata
:num_filters
is the number of underlying BFMetadata. It would be initialized as "1". And it can grow when "scaling" enabled.scaling
andexpansion
: Adding an element to a Bloom filter never fails due to the data structure "filling up". Instead the error rate starts to grow. To keep the error close to the one set on filter initialisation - the bloom filter will auto-scale, meaning when capacity is reached an additional sub-filter will be created. The size of the new sub-filter is the size of the last sub-filter multiplied by EXPANSION. The default expansion value is 2.For
BFMetadata
:num_distinct
: the distinct value in this bloomFiltersize
: the size of bytes for bloom filterfalse-positive-rate
: fpp for filter.During reading:
ChainMetadata
BFMetadata
During writing:
ChainMetadata
BFMetadata
Cons
Need two "metadata", which might a bit complex. And during "add", they should all be updated.
Pros
Flexible, we can separate the bf and bf-chain metadata.
Draft2
Other arguments is same as previous one. However, we can deduce num of each filter by
base_size
and(scaling, expansion)
. This avoid aBFMetadata
Cons
Not so easy to evolution the underlying BF format.
Pros
Only one metadata, is convinient
References
[1] https://redis.io/docs/stack/bloom/
[2] https://github.com/apache/parquet-format/blob/master/BloomFilter.md
[3] https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#full-filters-new-format
Beta Was this translation helpful? Give feedback.
All reactions