Skip to content

Conversation

@ekg
Copy link
Contributor

@ekg ekg commented Oct 20, 2025

Summary

Switch the default aligner from AllWave to SweepGA for better performance on typical pangenome graph construction workflows.

Rationale

SweepGA is faster for all-vs-all alignment of sequences ≥100bp:

  • Uses FastGA (fast genetic algorithm-based aligner)
  • Applies plane sweep filtering for clean 1:1 mappings
  • Better suited for typical genomic sequences

AllWave remains available via --aligner allwave for:

  • Shorter sequences (<100bp)
  • Cases requiring wavefront-specific features

Changes

Code

  • Changed default in Args: default_value = "sweepga" (was "allwave")
  • Updated help text to reflect new order

Documentation (README.md)

  • Updated Quick Start: --features use-sweepga
  • Updated Basic Usage examples
  • Fixed citation (removed incorrect co-author - sorry Erik!)
  • Added SweepGA to acknowledgments

Technical Note

SweepGA uses PAF output from FastGA, so orientation detection happens natively in the alignment phase (strand column in PAF). We parse the PAF into AlignmentRecord objects rather than using raw alignment objects.

Testing

✅ Build successful with --features use-sweepga
✅ All tests pass
✅ Verified SweepGA is used by default (FastGA output in logs)

Impact

Users will get faster alignments by default. Those needing AllWave can explicitly specify --aligner allwave.

ekg added 2 commits October 20, 2025 16:46
These seqwish-* directories are temporary working files created by seqwish
during graph construction runs. They were accidentally committed and are
bloating the repository.

Removed:
- 36 files across 12 seqwish-* directories (~10MB total)
- Files: .sqa, .sqi, .sqq (seqwish's intermediate sequence alignment data)

Added to .gitignore:
- seqwish-*/ pattern to prevent future commits
- wfmash-*/ pattern (similar temp directories)

These temporary directories are automatically created by seqwish and should
never be tracked in version control.
SweepGA (FastGA + plane sweep filtering) is significantly faster for
all-vs-all alignment of sequences ≥100bp, making it better suited as
the default aligner for typical pangenome graph construction workflows.

Changes:
- Set default aligner to 'sweepga' (was 'allwave')
- Updated README to reflect SweepGA as default
- Updated build instructions to use --features use-sweepga
- Fixed README citation (removed incorrect co-author)
- Added SweepGA to acknowledgments

AllWave remains available via --aligner allwave for shorter sequences
or when wavefront alignment is specifically needed.

Note: SweepGA uses PAF output from FastGA, so orientation detection
happens natively in the FastGA alignment phase (strand column in PAF).
@ekg ekg force-pushed the test-sweepga-default branch from b8b0eed to 114e732 Compare October 20, 2025 22:03
@ekg ekg merged commit 19cd53e into main Oct 20, 2025
2 checks passed
@ekg ekg deleted the test-sweepga-default branch October 20, 2025 22:07
ekg added a commit that referenced this pull request Oct 20, 2025
SweepGA alignments were not producing good results for graph construction.
Reverting back to AllWave as the default aligner.

This reverts:
- Switch default aligner to SweepGA (#8)
- Fix SweepGA filtering: remove min_block_length for graph construction (#9)

SweepGA work is preserved on the 'sweepga-default-experiment' branch for
future investigation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants