Skip to content

Add support for must-link/cannot-link constraints in tree clustering #65

@Neplex

Description

@Neplex

Is your feature request related to a problem? Please describe.
In some situation, when we know the data we may want to be able to add extra information by specifying if two group labels are equivalent (must-link) or should not be merged togethers (cannot-link). It would allow more fine-grained control to the user while maintaining an automatic process if needed.

When we have prior knowledge of data relationships, we may want to specify that certain groups must be merged together (must-link) or should not be put together (cannot-link). This would give users more control over clustering while still allowing automatic processes when no constraints are provided.

Describe the solution you'd like
Extend the tree clustering algorithm to support:

  1. must-link constraints: Ensure specified groups are always clustered together.
  2. cannot-link constraints: Ensure specified groups are never clustered together.

These constraints should be optional and not disrupt the default behavior, allowing automatic clustering when no constraints are given. We could rely on an existing python library or simply set the similarity to 1 when must-link and 0 when cannot-link but we may need to handle conflicting constraints.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions