A lightweight Python library for performing mathematical operations on CIGAR strings and tuples commonly used in genomic alignments.
Cigar Math provides tools to handle and manipulate CIGAR strings/tuples - the compact representations of sequence alignments used in SAM/BAM files. It simplifies common operations like:
- Handling soft/hard clipping
- Finding overlapping alignment blocks
- Detecting deletion locations
- Converting between different alignment formats
- Mapping between reference and query coordinates
pip install git+https://github.com/DamLabResources/cigarmath
import cigarmath as cm
# Convert CIGAR string to tuples
cigartuples = cm.cigarstr2tup('3H4M1D3M2I3M4H')
reference_start = 3
# Get reference coordinates
ref_block = cm.reference_block(cigartuples, reference_start)
print(ref_block) # (3, 14)
# Find mapping blocks (skipping deletions)
blocks = list(cm.reference_mapping_blocks(cigartuples, deletion_split=1))
print(blocks) # [(0, 7), (10, 14), (20, 24)]
# Convert between coordinate spaces
query_pos = list(cm.reference2query(cigartuples, reference_start=2))
print(query_pos) # [1, 2, 5, None, None, 6]
- Map between reference and query coordinates
- Handle soft and hard clipping
- Find alignment blocks and gaps
- Parse and manipulate CIGAR strings/tuples
- Simplify extended CIGAR operations
- Collapse adjacent blocks
- Detect deletions and insertions
- Find overlapping regions
- Calculate alignment metrics
The notebooks/
directory contains Jupyter notebooks with detailed examples:
block.ipynb
: Working with alignment blocks- More examples coming soon...
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this software in your research, please cite:
@software{cigar_math,
author = {Dampier, Will and Klopfenstein, DV},
title = {Cigar Math: A Python library for CIGAR string operations},
year = {2025},
publisher = {GitHub},
url = {https://github.com/DamLabResources/cigarmath}
}