-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better HLL estimator #17
Comments
@jianshu93 FYI, I implemented that estimator in my hyperloglog - it is still slow AF and only statistically better for what entails the union estimation. Maybe with better optimizers, one may get better performance, but when there is a need to compute logarithms in a core loop things can only go that fast. |
Thanks for letting me know! Did you benchmarked against the MLE and improved estimator in Ertl's paper, as implemented in SourMash (https://docs.rs/sourmash/0.15.0/sourmash/sketch/hyperloglog/estimators/index.html)? This is a direct import of the C++ version from the original paper. It would be nice to have a benchmark and I would be happy to play with your implementation. Best, Jianshu |
I am running several benchmarks, I can add also that one in. Thanks for pointing that one out. |
And also this one, hypertwobits(https://github.com/axiomhq/hypertwobits/tree/main), a very new one, paper was published just last month. I am a little bit concerned about it since the benchmarks showed very large variation even than the original hyperloglog. Let me know your thoughts on it also. Jianshu |
FYI, I have just read their joint estimation algo and it does not match the one from Otmar. I am quite sure of the correctness of my implementation since I bothered Otmar several times to check it together, so the only thing that remains to verify is which one is more accurate and their speeds. |
Yeah, those results seem sus, especially for a 2-bit register thinghy. We'll see in a bit, currently adding to lineup. |
I would also be interested in UltraLogLog (https://dl.acm.org/doi/abs/10.14778/3654621.3654632) and ExaLogLog (https://arxiv.org/abs/2402.13726). But no implementation is available in Rust. All Ertl's implementation is in the hash4j java library, which I do not like. Since you have so many benchmarks there, I see it as a paper describing and available HLL-like implementations. Jianshu |
Samesies on Java. A bit hard to do a fair comparison there and I do not have the time to re-implement it, at least, not now. |
Hello Team,
It seems that new cardinality estimator paper has two new estimators, the improved estimator and MLE estimator, better than the one originally proposed, see paper here: https://arxiv.org/pdf/1706.07290.pdf
Just wondering whether it worth trying, C++ version can be found in the paper.
Thank you,
Jianshu
The text was updated successfully, but these errors were encountered: