Skip to content

alanwilter/fasta-tools

Repository files navigation

Fasta tools

# install the python dev packages
uv sync --link-mode=copy --compile-bytecode -U --only-dev

# deploy using maturin
uv sync --link-mode=copy --compile-bytecode -U

# deploy and install the package (includes Python library and Rust binaries)
maturin develop --release

find-pep

A CLI tool for find peptides in UniProt fasta files

Usage: find-pep <INPUT> <LIST>

Arguments:
  <INPUT>  Path to the input FASTA file (required)
  <LIST>   Path to the Input peptide list file (required)

Options:
  -h, --help     Print help
  -V, --version  Print version
  • Example

    find-pep input_fasta.fasta input_list.txt | tee output.txt
    DTLMISR:F7HL06:1
    DTLMISR:A3RFZ7-3:1
    DTLMISR:F7F0D6:1
    ISRNQVSLTCLVK:F7HL06:1
    ISRNQVSLTCLVK:B0FPE9:2
    ISRNQVSLTCLVK:F7F0D6:1

fasta-extractor

Extract FASTA headers from a file.

Usage: fasta-extractor [OPTIONS] <INPUT>

Arguments:
  <INPUT>  Path to the input FASTA file (required)

Options:
  -n, --njobs <NJOBS>  Number of parallel jobs (default: all available CPUs)
  -h, --help           Print help
  -V, --version        Print version
  • Example

    fasta-extractor input_fasta.fasta | tee output.txt
    Extracting headers from FASTA file... using 10 jobs
    >tr|F7HL06|F7HL06_MACMU Ig-like domain-containing protein OS=Macaca mulatta OX=9544 PE=4 SV=3
    >tr|F7F0D6|F7F0D6_MACMU Ig-like domain-containing protein OS=Macaca mulatta OX=9544 PE=4 SV=3
    >sp|Q63ZW7-3|INADL_MOUSE Isoform 3 of InaD-like protein OS=Mus musculus OX=10090 GN=Patj
    >sp|B0FPE9|NLRP3_MACMU NACHT, LRR and PYD domains-containing protein 3 OS=Macaca mulatta OX=9544 GN=NLRP3 PE=2 SV=1
    >sp|A3RFZ7-3|FCG3A_MACMU Isoform 3 of Low affinity immunoglobulin gamma Fc region receptor III-A OS=Macaca mulatta OX=9544 GN=FCGR3A

Python Usage Examples

Using find_peptides

from fasta_tools.rust_fasta_tools import find_peptides

with open('tests/input_fasta.fasta', 'r') as fasta_file:
    fasta_content = fasta_file.read()

with open('tests/input_list.txt', 'r') as peptides_file:
    peptides = [line.strip() for line in peptides_file]

results = find_peptides(fasta_content, peptides)
for peptide, accession, count in results:
    print(f"{peptide}:{accession}:{count}")

Using extract_headers

from fasta_tools.rust_fasta_tools import extract_headers

headers = extract_headers('tests/input_fasta.fasta', njobs=4)
for header in headers:
    print(header)

Bumping Version

Bump the version number using commitizen (cz). You can specify the part of the version to bump (major, minor, or patch).

cz bump --dry-run

# if all fine
cz bump

# Push changes and tags
git push origin --tags

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published