Skip to content

soedinglab/PMGen

Repository files navigation

PMGen: From Structure Prediction to Peptide Generation

PMGen (Peptide MHC Generator) is a comprehensive pipeline for predicting peptide-MHC (pMHC) complex structures and designing optimized peptide sequences.

Key Features

  • Fast & accurate structure prediction using AlphaFold with template engineering or initial guess mode
  • Peptide sequence design with structure-aware optimization
  • MHC pseudo-sequence design for customized allele engineering
  • Iterative peptide optimization with binding prediction
  • Mutation screening for systematic variant analysis
  • Batch processing for multiple pMHC complexes

Installation

Requirements

  • Python 3.8+
  • Conda or Mamba
  • Git Optional
  • Modeller (requires a license key - get it here)
  • CUDA-enabled GPU (Required for faster Alphafold predictions)

Setup

git clone https://github.com/soedinglab/PMGen.git
cd PMGen
bash -l install.sh
#or, for CPU only support run: bash -l install.sh --cpu
conda activate PMGen

You will be prompted to enter your Modeller license key. The script automatically:

  • Creates the PMGen Conda environment
  • Installs all dependencies
  • Downloads AlphaFold parameters
  • Clones PANDORA and ProteinMPNN

Optional: Configure NetMHCpan (Recommended)

Install NetMHCpan and NetMHCIIpan, then edit user_setting.py:

netmhcipan_path = '/path/to/netMHCpan'
netmhciipan_path = '/path/to/netMHCIIpan'

Quick Start

Prepare Input File

Create a tab-separated file (input.tsv) with your pMHC data:

peptide	mhc_seq	mhc_type	anchors	id
GILGFVFTL	GSHSMRYFFTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASQRMEPRAPWIEQEGPEYWDGETRKVKAHSQTHRVDLGTLRGYYNQSEAGSHTVQRMYGCDVGSDWRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQLRAYLEGTCVEWLRRYLENGKETLQRTDAPKTHMTHHAVSDHEATLRCWALSFYPAEITLTWQRDGEDQTQDTELVETRPAGDGTFQKWAAVVVPSGQEQRYTCHVQHEGLPKPLTLRWE	1		sample1
KLGGALQAK	GSHSLKYFHTSVSRPGRGEPRFISVGYVDDTQFVRFDSDAASPRGEPRAPWVEQEGPEYWDRNTQIFKTNTQTYRENLRIALRYYNQSEAGSHIIQRMYGCDLGPDGRLLRGHDQYAYDGKDYIALNEDLRSWTAADTAAQITQRKWEAAREAEQRRAYLEGTCVEWLRRYLKNGNATLLRTDSPKTHMTHHPISDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPAGDRTFQKWAAVVVPSGEEQRYTCHVQHEGLPKPLTLRWE	1		sample2

Columns:

  • peptide: Peptide sequence
  • mhc_seq: MHC sequence (for MHC-II: Alpha/Beta separated by /)
  • mhc_type: 1 for MHC-I, 2 for MHC-II
  • anchors: Anchor positions (leave empty for prediction)
  • id: Unique identifier

Basic Structure Prediction (Recommended)

Single-threaded mode with initial guess (fastest & most accurate):

python run_PMGen.py \
  --mode wrapper \
  --run single \
  --df input.tsv \
  --output_dir output/ \
  --initial_guess

This is the preferred mode for most users. It uses:

  • --mode wrapper: Works for one or more than one prediction per run.
  • --run single: Sequential processing (unparallel)
  • --initial_guess: Fast and more accurate AlphaFold mode without homology modeling (recommended)

Common Use Cases

1. Structure Prediction with Multiple Models

python run_PMGen.py \
  --mode wrapper \
  --run single \
  --df input.tsv \
  --output_dir output/ \
  --initial_guess \
  --models model_1_ptm model_2_ptm model_3_ptm

2. Peptide Sequence Design

Generate optimized peptide variants:

python run_PMGen.py \
  --mode wrapper \
  --run single \
  --df input.tsv \
  --output_dir output/ \
  --initial_guess \
  --peptide_design \
  --num_sequences_peptide 50 \
  --binder_pred

3. MHC Pseudo-Sequence Design

Customize MHC binding groove residues:

python run_PMGen.py \
  --mode wrapper \
  --run single \
  --df input.tsv \
  --output_dir output/ \
  --initial_guess \
  --only_pseudo_sequence_design \
  --num_sequences_mhc 20

4. Iterative Peptide Optimization

Optimize peptides over multiple rounds:

python run_PMGen.py \
  --mode wrapper \
  --run single \
  --df input.tsv \
  --output_dir output/ \
  --initial_guess \
  --peptide_design \
  --binder_pred \
  --iterative_peptide_gen 3 \
  --fix_anchors

5. Mutation Screening

Systematically test point mutations:

python run_PMGen.py \
  --mode wrapper \
  --run single \
  --df input.tsv \
  --output_dir output/ \
  --initial_guess \
  --mutation_screen \
  --n_mutations 1

Key Options

Flag Description
--mode wrapper Batch processing mode (recommended)
--run single Sequential processing (recommended)
--initial_guess Fast AF mode without templates (recommended)
--peptide_design Enable peptide sequence generation
--only_pseudo_sequence_design Design MHC binding groove only
--binder_pred Predict binding affinity (requires NetMHCpan)
--fix_anchors Keep anchor positions fixed during design
--iterative_peptide_gen N Run N rounds of optimization
--mutation_screen Systematic mutation analysis
--num_templates Number of structural templates (default: 4)
--num_recycles AlphaFold recycles (default: 3)

Output Structure

output/
├── pandora/              # Template structures
├── alphafold/            # Predicted pMHC structures
├── proteinmpnn/          # Designed sequences
│   └── {id}/
│       ├── peptide_design/
│       └── only_pseudo_sequence_design/
└── best_structures/      # Top-ranked models (if --best_structures used)

Citation

If you use PMGen, please cite the underlying methods:

Support

For issues or questions, please open an issue on GitHub.

About

peptide-MHC modeling, binding prediction, and protein design.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages