You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix identity calculation by reading X records from .1aln files
Problem:
- Identity values from .1aln files didn't match ALNtoPAF output
- Was using D record (trace-point metadata) instead of actual edit distances
- Caused filtering inconsistencies between .1aln and PAF workflows
Solution:
- Read X records (INT_LIST of per-tracepoint edit distances)
- Sum X values to get total edit distance
- Apply correct ALNtoPAF formula: divergence = (sum(X) - del) / query_span / 2.0
- Calculate matches from corrected identity values
The key insight is that sum(X) represents "symmetric" divergence and must
be divided by 2 to match ALNtoPAF's calculation.
Validation:
- 100% match rate on 100 test records (max diff: 0.000098)
- Format-preserving .1aln filtering now works correctly
- See: IDENTITY_CALCULATION_SOLUTION.md in sweepga repo
This enables format-agnostic filtering where .1aln → .1aln produces
identical results to .1aln → PAF → filter → PAF conversion.
0 commit comments