Description
Hi,
First, I want to extend my appreciation for the great work on the fuzzyjoin
package. Our team relies on it extensively, and it has been an invaluable tool in our workflows.
Recently, I was tasked with optimizing certain performance bottlenecks in one of our pipelines. To address this, I experimented with implementing a fuzzy join using Rust, which led to significant improvements in execution speed. I adapted this approach into a public example, available at https://github.com/JonDDowns/fozziejoin. While the benchmark is not exhaustive, I consistently observe 4–20x performance improvements across various datasets.
Given these results, I wanted to ask if there would be interest in integrating a similar approach within fuzzyjoin
. Replacing stringdist
with an alternative would indeed be a substantial change, but I believe it could offer considerable performance benefits.
I’d love to hear your thoughts on this and whether there might be an opportunity to collaborate on incorporating these enhancements into the package.
Best,
Jon