Skip to content

Building the UMGAP indexes

Pieter Verschaffelt edited this page Apr 4, 2025 · 1 revision

In order to generate the index input files that are required for the UMGAP pipeline to run, you can use the ./scripts/generate_umgap_tables.sh.

Usage

Usage: ./scripts/generate_umgap_tables.sh <mode> [OPTIONS]

Supported modes

  • kmer: Creates a k-mer index based on UniProt entries.
  • tryptic: Creates a tryptic peptide index based on sequence data.

Mode kmer

Required config values

  • --output-dir: Directory to save the output files.

Optional config values

  • --database-sources: Comma-separated list of database sources ('swissprot', 'trembl'), (default: 'swissprot,trembl')
  • --temp-dir: Temporary directory for intermediate files (default: '/tmp')
  • --sort-memory: Amount of memory (e.g., '2G') for the sort utility (default: '2G')
  • --help: Prints this help message.
  • --kmer-length: Length of k-mers for the index (optional, default: 9).

Mode tryptic

Required config values

  • --output-dir: Directory to save the output files.

Optional config values

  • --database-sources: Comma-separated list of database sources ('swissprot', 'trembl'), (default: 'swissprot,trembl')
  • --temp-dir: Temporary directory for intermediate files (default: '/tmp')
  • --sort-memory: Amount of memory (e.g., '2G') for the sort utility (default: '2G')
  • --help: Prints this help message.
  • --min-peptide-length: Minimum length of tryptic peptides (default: 5)
  • --max-peptide-length: Maximum length of tryptic peptides (default: 50)

Examples

./scripts/generate_umgap_tables.sh kmer --database-sources swissprot,trembl --output-dir /path/to/output --kmer-length 7
./scripts/generate_umgap_tables.sh tryptic --database-sources swissprot --output-dir /path/to/output --min-peptide-length 6 --max-peptide-length 30

Output

After successful execution, this script generates the following files that can be used by UMGAP:

  • kmer.index: required if you're running UMGAP in kmer mode.
  • tryptic.index: required if you're running UMGAP in tryptic mode.

Clone this wiki locally