-
Notifications
You must be signed in to change notification settings - Fork 2
Building the UMGAP indexes
Pieter Verschaffelt edited this page Apr 4, 2025
·
1 revision
In order to generate the index input files that are required for the UMGAP pipeline to run, you can use the ./scripts/generate_umgap_tables.sh.
Usage: ./scripts/generate_umgap_tables.sh <mode> [OPTIONS]-
kmer: Creates a k-mer index based on UniProt entries. -
tryptic: Creates a tryptic peptide index based on sequence data.
-
--output-dir: Directory to save the output files.
-
--database-sources: Comma-separated list of database sources ('swissprot', 'trembl'), (default: 'swissprot,trembl') -
--temp-dir: Temporary directory for intermediate files (default: '/tmp') -
--sort-memory: Amount of memory (e.g., '2G') for the sort utility (default: '2G') -
--help: Prints this help message. -
--kmer-length: Length of k-mers for the index (optional, default: 9).
-
--output-dir: Directory to save the output files.
-
--database-sources: Comma-separated list of database sources ('swissprot', 'trembl'), (default: 'swissprot,trembl') -
--temp-dir: Temporary directory for intermediate files (default: '/tmp') -
--sort-memory: Amount of memory (e.g., '2G') for the sort utility (default: '2G') -
--help: Prints this help message. -
--min-peptide-length: Minimum length of tryptic peptides (default: 5) -
--max-peptide-length: Maximum length of tryptic peptides (default: 50)
./scripts/generate_umgap_tables.sh kmer --database-sources swissprot,trembl --output-dir /path/to/output --kmer-length 7
./scripts/generate_umgap_tables.sh tryptic --database-sources swissprot --output-dir /path/to/output --min-peptide-length 6 --max-peptide-length 30After successful execution, this script generates the following files that can be used by UMGAP:
-
kmer.index: required if you're running UMGAP in kmer mode. -
tryptic.index: required if you're running UMGAP in tryptic mode.