Skip to content

Fast SimHash with feature weight (as in Charikar original paper) #413

Open
@jianshu93

Description

@jianshu93

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

When a vector has weight, not only binary, and the weight is important for my application

Describe the solution you'd like
A clear and concise description of what you want to happen.

“±1 per feature bit” to “±weight per feature bit.” in fast SimHash. We just have to replace each unit-add of the bulk mask with a weight add, and then compare against half the total weight instead of half the token count.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions