Skip to content

Improve Algorithm for Merging Similar Equivalence Classes #7

@heikomuller

Description

@heikomuller

The current implementation for SimilarTermIndexGenerator is rather naive. It merges all equivalence classes in a connected component based on similarity between pairs of equivalence classes. This approach has the strong disadvantage of potentially merging dis-similar equivalence classes because similarity is not transitive.

One improvement could be to pick equivalence classes as strong seeds and then merge them with all other equivalence classes that are similar to the seed. While this could still merge dis-similar equivalence classes there is the guarantee that they all at least satisfy the similarity threshold with the seed equivalence class.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions