-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Currently the output of pangraph is deterministic given the same input file.
However, for two different input files with the same sequences but in different order, the output can still vary slightly.
This should be due to the fact that the order of mergers is determined by the guide-tree, which is a balanced version of the neighbour-joining tree. Differences in branch order can cause differences in which pairs are merged.
If we want to make the output deterministic, irrespective of the order of sequences, we could:
- remove tree balancing. This was done to improve parallelism in tree traversal but it's not really used now.
- For each internal node, deterministically decide which is the left and right child node depending on some properties of the sequence (i.e. hash)
(Thanks @mjohnpayne for pointing this out!)
mjohnpayne
Metadata
Metadata
Assignees
Labels
No labels