🔬 Advanced PAF alignment analysis tool
PafLook is a command-line tool for analyzing and validating PAF (Pairwise mApping Format) files, which are commonly used in bioinformatics for representing alignments between sequences.
- Comprehensive Analysis: Calculate identity metrics, coverage, and alignment statistics
- Multiple Output Formats: Pretty-printed reports, JSON, and various TSV formats
- Filtering Options: Filter alignments by length, mapping quality, or identity
- Parallel Processing: Multi-threaded analysis for large datasets
- Validation: Verify PAF file format correctness
# Clone the repository
git clone https://github.com/yourusername/paflook.git
cd paflook
# Build the project
cargo build --release
# Optional: Install the binary to your path
cargo install --path .# Basic analysis with pretty-printed output
paflook analyze input.paf
# Specify output format
paflook analyze input.paf --format json
paflook analyze input.paf --format tsv-summary
paflook analyze input.paf --format tsv-per-sequence
paflook analyze input.paf --format tsv-detailed
# Filter alignments
paflook analyze input.paf --min-length 1000 --min-mapq 30 --min-identity 0.9
# Show per-sequence metrics
paflook analyze input.paf --per-sequence
# Include detailed alignment records in output
paflook analyze input.paf --detailed 100
# Write output to file
paflook analyze input.paf --output results.txt
# Use multiple threads
paflook analyze input.paf --threads 8
# Show progress bar
paflook analyze input.paf --progress
# Read from stdin
minimap2 query.fa target.fa | paflook analyze -# Validate a PAF file
paflook validate input.paf
# Control number of error examples shown
paflook validate input.paf --error-examples 10PafLook calculates several important metrics:
- Genome Jaccard Index: Similarity measure between query and target sequences
- Block Identity: Proportion of matching bases in alignment blocks
- Gap-Compressed Identity: Identity measure that counts gap events rather than gap lengths
- CIGAR Event Totals: Counts of matches, mismatches, insertions, and deletions
- Per-Sequence Metrics: Coverage, identity, and mapping quality for each sequence
This project is licensed under the MIT OR Apache-2.0 License.
Contributions are welcome! Please feel free to submit a Pull Request.