Skip to content

waveygang/paflook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

PafLook

🔬 Advanced PAF alignment analysis tool

Overview

PafLook is a command-line tool for analyzing and validating PAF (Pairwise mApping Format) files, which are commonly used in bioinformatics for representing alignments between sequences.

Features

  • Comprehensive Analysis: Calculate identity metrics, coverage, and alignment statistics
  • Multiple Output Formats: Pretty-printed reports, JSON, and various TSV formats
  • Filtering Options: Filter alignments by length, mapping quality, or identity
  • Parallel Processing: Multi-threaded analysis for large datasets
  • Validation: Verify PAF file format correctness

Installation

# Clone the repository
git clone https://github.com/yourusername/paflook.git
cd paflook

# Build the project
cargo build --release

# Optional: Install the binary to your path
cargo install --path .

Usage

Analyze PAF Files

# Basic analysis with pretty-printed output
paflook analyze input.paf

# Specify output format
paflook analyze input.paf --format json
paflook analyze input.paf --format tsv-summary
paflook analyze input.paf --format tsv-per-sequence
paflook analyze input.paf --format tsv-detailed

# Filter alignments
paflook analyze input.paf --min-length 1000 --min-mapq 30 --min-identity 0.9

# Show per-sequence metrics
paflook analyze input.paf --per-sequence

# Include detailed alignment records in output
paflook analyze input.paf --detailed 100

# Write output to file
paflook analyze input.paf --output results.txt

# Use multiple threads
paflook analyze input.paf --threads 8

# Show progress bar
paflook analyze input.paf --progress

# Read from stdin
minimap2 query.fa target.fa | paflook analyze -

Validate PAF Files

# Validate a PAF file
paflook validate input.paf

# Control number of error examples shown
paflook validate input.paf --error-examples 10

Output Metrics

PafLook calculates several important metrics:

  • Genome Jaccard Index: Similarity measure between query and target sequences
  • Block Identity: Proportion of matching bases in alignment blocks
  • Gap-Compressed Identity: Identity measure that counts gap events rather than gap lengths
  • CIGAR Event Totals: Counts of matches, mismatches, insertions, and deletions
  • Per-Sequence Metrics: Coverage, identity, and mapping quality for each sequence

License

This project is licensed under the MIT OR Apache-2.0 License.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

PAF alignment statistics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages