-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Is your feature request related to a problem? Please describe.
In some situation, when we know the data we may want to be able to add extra information by specifying if two group labels are equivalent (must-link) or should not be merged togethers (cannot-link). It would allow more fine-grained control to the user while maintaining an automatic process if needed.
When we have prior knowledge of data relationships, we may want to specify that certain groups must be merged together (must-link
) or should not be put together (cannot-link
). This would give users more control over clustering while still allowing automatic processes when no constraints are provided.
Describe the solution you'd like
Extend the tree clustering algorithm to support:
must-link
constraints: Ensure specified groups are always clustered together.cannot-link
constraints: Ensure specified groups are never clustered together.
These constraints should be optional and not disrupt the default behavior, allowing automatic clustering when no constraints are given. We could rely on an existing python library or simply set the similarity to 1
when must-link
and 0
when cannot-link
but we may need to handle conflicting constraints.