PMGen (Peptide MHC Generator) is a comprehensive pipeline for predicting peptide-MHC (pMHC) complex structures and designing optimized peptide sequences.
- Fast & accurate structure prediction using AlphaFold with template engineering or initial guess mode
- Peptide sequence design with structure-aware optimization
- MHC pseudo-sequence design for customized allele engineering
- Iterative peptide optimization with binding prediction
- Mutation screening for systematic variant analysis
- Batch processing for multiple pMHC complexes
- Python 3.8+
- Conda or Mamba
- Git Optional
- Modeller (requires a license key - get it here)
- CUDA-enabled GPU (Required for faster Alphafold predictions)
git clone https://github.com/soedinglab/PMGen.git
cd PMGen
bash -l install.sh
#or, for CPU only support run: bash -l install.sh --cpu
conda activate PMGenYou will be prompted to enter your Modeller license key. The script automatically:
- Creates the PMGen Conda environment
- Installs all dependencies
- Downloads AlphaFold parameters
- Clones PANDORA and ProteinMPNN
Install NetMHCpan and NetMHCIIpan, then edit user_setting.py:
netmhcipan_path = '/path/to/netMHCpan'
netmhciipan_path = '/path/to/netMHCIIpan'Create a tab-separated file (input.tsv) with your pMHC data:
peptide mhc_seq mhc_type anchors id
GILGFVFTL GSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATLRCWALSFYPAEITLTWQRDGEDQTQDTELVETRPAGDGTFQKWAAVVVPSGQEQRYTCHVQHEGLPKPLTLRWE 1 sample1
KLGGALQAK GSHSLKYFHTSVSRPGRGEPRFISVGYVDDTQFVRFDSDAASPRGEPRAPWVEQEGPEYWDRNTQIFKTNTQTYRENLRIALRYYNQSEAGSHIIQRMYGCDLGPDGRLLRGHDQYAYDGKDYIALNEDLRSWTAADTAAQITQRKWEAAREAEQRRAYLEGTCVEWLRRYLKNGNATLLRTDSPKTHMTHHPISDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPAGDRTFQKWAAVVVPSGEEQRYTCHVQHEGLPKPLTLRWE 1 sample2Columns:
peptide: Peptide sequencemhc_seq: MHC sequence (for MHC-II: Alpha/Beta separated by/)mhc_type: 1 for MHC-I, 2 for MHC-IIanchors: Anchor positions (leave empty for prediction)id: Unique identifier
Single-threaded mode with initial guess (fastest & most accurate):
python run_PMGen.py \
--mode wrapper \
--run single \
--df input.tsv \
--output_dir output/ \
--initial_guessThis is the preferred mode for most users. It uses:
--mode wrapper: Works for one or more than one prediction per run.--run single: Sequential processing (unparallel)--initial_guess: Fast and more accurate AlphaFold mode without homology modeling (recommended)
python run_PMGen.py \
--mode wrapper \
--run single \
--df input.tsv \
--output_dir output/ \
--initial_guess \
--models model_1_ptm model_2_ptm model_3_ptmGenerate optimized peptide variants:
python run_PMGen.py \
--mode wrapper \
--run single \
--df input.tsv \
--output_dir output/ \
--initial_guess \
--peptide_design \
--num_sequences_peptide 50 \
--binder_predCustomize MHC binding groove residues:
python run_PMGen.py \
--mode wrapper \
--run single \
--df input.tsv \
--output_dir output/ \
--initial_guess \
--only_pseudo_sequence_design \
--num_sequences_mhc 20Optimize peptides over multiple rounds:
python run_PMGen.py \
--mode wrapper \
--run single \
--df input.tsv \
--output_dir output/ \
--initial_guess \
--peptide_design \
--binder_pred \
--iterative_peptide_gen 3 \
--fix_anchorsSystematically test point mutations:
python run_PMGen.py \
--mode wrapper \
--run single \
--df input.tsv \
--output_dir output/ \
--initial_guess \
--mutation_screen \
--n_mutations 1| Flag | Description |
|---|---|
--mode wrapper |
Batch processing mode (recommended) |
--run single |
Sequential processing (recommended) |
--initial_guess |
Fast AF mode without templates (recommended) |
--peptide_design |
Enable peptide sequence generation |
--only_pseudo_sequence_design |
Design MHC binding groove only |
--binder_pred |
Predict binding affinity (requires NetMHCpan) |
--fix_anchors |
Keep anchor positions fixed during design |
--iterative_peptide_gen N |
Run N rounds of optimization |
--mutation_screen |
Systematic mutation analysis |
--num_templates |
Number of structural templates (default: 4) |
--num_recycles |
AlphaFold recycles (default: 3) |
output/
├── pandora/ # Template structures
├── alphafold/ # Predicted pMHC structures
├── proteinmpnn/ # Designed sequences
│ └── {id}/
│ ├── peptide_design/
│ └── only_pseudo_sequence_design/
└── best_structures/ # Top-ranked models (if --best_structures used)
If you use PMGen, please cite the underlying methods:
- PANDORA: Antunes et al., Front. Immunol. 2022
- AlphaFold: Jumper et al., Nature 2021
- AFfine: Bradley et al., PNAS 2023
- ProteinMPNN: Dauparas et al., Science 2022
For issues or questions, please open an issue on GitHub.