SeqRush

SeqRush is a fast pangenome graph construction tool that builds bidirected sequence graphs with integrated layout optimization.

Overview

SeqRush constructs pangenome variation graphs by:

Performing all-vs-all pairwise sequence alignments using AllWave or SweepGA aligners
Using a lock-free bidirected union-find structure to merge matching positions
Building a bidirected graph with proper handling of forward and reverse complement alignments
Applying the Ygs pipeline (SGD layout + grooming + topological sort) for optimized graph layout
Outputting GFA format compatible with tools like ODGI and VG

Features

Bidirected Graph Support: Properly handles forward and reverse complement alignments
Multiple Alignment Backends: AllWave (wavefront-based) and SweepGA (FastGA + plane sweep filtering)
Sparsification Strategies: TreeSampling for intelligent pair selection (k-nearest + k-farthest + random)
Iterative Alignment: Optional two-phase alignment with early stopping for large datasets
Graph Layout Optimization: Integrated Ygs pipeline (path-guided SGD + grooming + topological sort)
Node Compaction: Linear chain compaction to reduce graph complexity
Standard GFA Output: Compatible with downstream pangenome tools

Quick Start

# Install Rust if needed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone and build
git clone --recursive https://github.com/pangenome/seqrush.git
cd seqrush
cargo build --release --features use-allwave

# Build a graph from sequences
./target/release/seqrush -s input.fa -o output.gfa -k 0

# View statistics
odgi stats -i output.gfa -S

Installation

Prerequisites

Rust 1.70+ (automatically managed via rust-toolchain.toml)
Git with submodule support

Build from Source

git clone --recursive https://github.com/pangenome/seqrush.git
cd seqrush
cargo build --release --features use-allwave

The binary will be available at target/release/seqrush.

Usage

Basic Usage

# Build graph with default AllWave aligner
./target/release/seqrush -s sequences.fa -o graph.gfa -k 0

# Use SweepGA aligner instead
./target/release/seqrush -s sequences.fa -o graph.gfa -k 0 --aligner sweepga

# Enable iterative alignment for large datasets
./target/release/seqrush -s sequences.fa -o graph.gfa -k 0 --iterative

Common Options

seqrush \
  -s sequences.fa \          # Input FASTA file
  -o graph.gfa \             # Output GFA file
  -k 0 \                     # Minimum match length (0 = unite all exact matches)
  -t 4 \                     # Number of threads
  -v \                       # Verbose output
  --iterative \              # Use iterative alignment with early stopping
  --no-compact \             # Skip linear chain compaction
  --skip-sgd \               # Skip SGD layout step
  --skip-groom \             # Skip grooming step
  --skip-topo                # Skip topological sort step

Alignment Scoring

# Default two-piece affine gap scoring
-S "0,5,8,2,24,1"  # match, mismatch, gap1_open, gap1_extend, gap2_open, gap2_extend

# Orientation refinement scoring
--orientation-scores "0,1,1,1"  # match, mismatch, gap_open, gap_extend

Sparsification

Control which sequence pairs to align:

# Align all pairs (default)
--sparsification "1.0"

# TreeSampling: k-nearest + k-farthest + random fraction
# Format: "tree:k_nearest,k_farthest,random_fraction"
--sparsification "tree:3,3,0.1"

Iterative Alignment

For large datasets, use iterative mode to reduce computational work:

./target/release/seqrush -s input.fa -o output.gfa -k 0 --iterative

Iterative mode processes alignments in two phases:

Phase 1: All tree pairs (k-nearest + k-farthest) - guarantees connectivity
Phase 2: Random pairs with early stopping when graph stabilizes

Algorithm Details

Graph Construction Pipeline

Load FASTA sequences
  ↓
Align sequence pairs (AllWave/SweepGA)
  ↓
Unite matching positions (BidirectedUnionFind)
  ↓
Build bidirected graph from union components
  ↓
Compact linear chains (optional)
  ↓
Apply Ygs sorting pipeline:
  - Y: Path-guided SGD layout
  - g: Grooming (orientation consistency)
  - s: Topological sort
  ↓
Write GFA

Key Implementation Details

Bidirected Graphs: Sequences can align in forward or reverse complement orientation. Each node has two orientations (forward + and reverse -), and edges connect oriented nodes: 5+ → 6- means "forward strand of node 5 connects to reverse strand of node 6".

Union-Find: Lock-free bidirected union-find efficiently merges matching positions while tracking orientation. Positions are encoded as Pos type with orientation bit.

Alignment Processing: Only exact matches from CIGAR strings are united. Self-alignments are included to establish the sequence backbone.

Graph Layout: The Ygs pipeline optimizes node ordering:

Y (SGD): Path-guided stochastic gradient descent for 1D layout
g (grooming): BFS-based orientation flipping for consistency
s (topological sort): Final ordering based on path structure

Performance Characteristics

Time Complexity: O(n²×L×A) for n sequences of length L with alignment cost A
Space Complexity: O(N) where N is total sequence length
Parallel Scaling: Near-linear for alignment phase

Output Format

SeqRush generates GFA 1.0 format:

H	VN:Z:1.0
S	1	ACGT         # Segment (node) with sequence
S	2	GC
L	1	+	2	+	0M   # Link between oriented nodes
P	seq1	1+,2+	*    # Path representing input sequence

Development

Running Tests

# Run all tests
cargo test

# Run specific test suite
cargo test bidirected

# Run with output
cargo test -- --nocapture

Project Structure

seqrush/
├── src/
│   ├── lib.rs                      # Library interface
│   ├── main.rs                     # CLI entry point
│   ├── seqrush.rs                  # Main graph construction logic
│   ├── bidirected_union_find.rs   # Lock-free oriented union-find
│   ├── bidirected_builder.rs      # Graph builder from union-find
│   ├── path_sgd.rs                 # Path-guided SGD layout
│   ├── grooming.rs                 # Orientation consistency
│   ├── topological_sort.rs         # Path-based topological ordering
│   └── aligner/                    # Alignment backends
│       ├── allwave_impl.rs
│       └── sweepga_impl.rs
├── tests/                          # Integration tests
├── allwave/                        # AllWave aligner (submodule)
├── WFA2-lib/                       # WFA2 library (submodule)
└── CLAUDE.md                       # Development guide

Validation

SeqRush has been validated against ODGI on the HLA-Zoo dataset (28 HLA graphs):

Structural correctness: 100% (all graphs pass ODGI validation with zero changes)
Layout quality: RMSE currently 3.2x worse than ODGI (under investigation)

Limitations

Input sequences must fit in memory
SGD layout quality needs improvement (see CLAUDE.md for details)
Limited to DNA sequences (ACGT alphabet)

Citation

If you use SeqRush in your research, please cite:

SeqRush: Fast bidirected pangenome graph construction
Erik Garrison, Kristopher Kubicki, 2025
https://github.com/pangenome/seqrush

License

MIT License - see LICENSE for details.

Acknowledgments

SeqRush is inspired by and builds upon:

seqwish - Original pangenome graph construction algorithm
ODGI - Optimized dynamic genome/graph implementation
AllWave - Wavefront-based sequence aligner
WFA2-lib - Wavefront alignment algorithm library

Special thanks to Erik Garrison for the seqwish algorithm and graph construction concepts.

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
.cargo_home		.cargo_home
.codex		.codex
.github/workflows		.github/workflows
HLA-zoo @ 20ab283		HLA-zoo @ 20ab283
HLA_graphs_sorted		HLA_graphs_sorted
WFA2-lib @ 3c1734e		WFA2-lib @ 3c1734e
allwave @ cfdb9bb		allwave @ cfdb9bb
doc		doc
docs		docs
seqwish-local @ 8063688		seqwish-local @ 8063688
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SeqRush

Overview

Features

Quick Start

Installation

Prerequisites

Build from Source

Usage

Basic Usage

Common Options

Alignment Scoring

Sparsification

Iterative Alignment

Algorithm Details

Graph Construction Pipeline

Key Implementation Details

Performance Characteristics

Output Format

Development

Running Tests

Project Structure

Validation

Limitations

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

pangenome/seqrush

Folders and files

Latest commit

History

Repository files navigation

SeqRush

Overview

Features

Quick Start

Installation

Prerequisites

Build from Source

Usage

Basic Usage

Common Options

Alignment Scoring

Sparsification

Iterative Alignment

Algorithm Details

Graph Construction Pipeline

Key Implementation Details

Performance Characteristics

Output Format

Development

Running Tests

Project Structure

Validation

Limitations

Citation

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages