This is the official implementation of the paper JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles, accepted at NeurIPS 2025.
Conformational ensembles of protein structures are immensely important both for understanding protein function and drug discovery in novel modalities such as cryptic pockets. Current techniques for sampling ensembles such as molecular dynamics (MD) are computationally inefficient, while many recent machine learning methods do not generalize well outside their training data. We propose JAMUN which performs MD in a smoothed, noised space of all-atom 3D conformations of molecules by utilizing the framework of walk-jump sampling. JAMUN enables ensemble generation for small peptides at rates of an order of magnitude faster than traditional molecular dynamics. The physical priors in JAMUN enables transferability to systems outside of its training data, even to peptides that are longer than those originally trained on.
An overview of the walk-jump sampling scheme, which is similar to classical molecular dynamics, but in a smoothed space:
TICA-0,1 projections on unseen 5AA peptides:
Clone the repository with HTTPS:
git clone https://github.com/prescient-design/jamun.gitor SSH:
git clone [email protected]:prescient-design/jamun.gitNavigate to the cloned repository:
cd jamunWe recommend creating a mamba or conda environment.
This is because certain dependencies are tricky to install directly.
conda create --name jamun python=3.11 -y
conda activate jamun
conda install -c conda-forge ambertools=23 openmm pdbfixer pyemma -y
conda install pulchra -c bioconda -yThe remaining dependencies can be installed via pip or uv (recommended).
uv pip install -e .[dev]The uncapped 2AA data from Timewarp can be obtained from Hugging Face.
cd /path/to/data/root/
git lfs install
git clone https://huggingface.co/datasets/microsoft/timewarpwhere /path/to/data/root/ is the path where you want to store the datasets.
This should be your directory structure:
/path/to/data/root/
└── timewarp/
├── 2AA-1-big/
│ └── ...
├── 2AA-1-large/
│ └── ...Now, set the environment variable JAMUN_DATA_PATH:
export JAMUN_DATA_PATH=/path/to/data/root/or, create a .env file in the root of the repository and set JAMUN_DATA_PATH:
JAMUN_DATA_PATH=/path/to/data/root/Set the environment variable JAMUN_ROOT_PATH (default: current directory) to specify where outputs from training and sampling are saved:
export JAMUN_ROOT_PATH=...or in the .env file in the root of the repository:
JAMUN_ROOT_PATH=...Once you have downloaded the data and set the appropriate variables correctly, you can start training on Timewarp.
We recommend first running our test config (on one GPU) to check that installation was successful:
CUDA_VISIBLE_DEVICES=0 jamun_train --config-dir=configs experiment=train_test.yamlThen, you can train on the uncapped 2AA peptides dataset:
jamun_train --config-dir=configs experiment=train_uncapped_2AA.yamlor the uncapped 4AA peptides dataset:
jamun_train --config-dir=configs experiment=train_uncapped_4AA.yamlWe also provide example SLURM launcher scripts for training and sampling on SLURM clusters:
sbatch scripts/slurm/train.sh
sbatch scripts/slurm/sample.shWe provide trained models (for both sampling, and restarting training) for Timewarp 2AA, Timewarp 4AA, MDGen 4AA and other datasets at Hugging Face.
Unfortunately, some of these checkpoints were from an older version of this code. If you wish to run sampling with these checkpoints, we have made an old-checkpoints branch for compatibility:
git switch old-checkpointsThen, clone the checkpoints repository:
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/ameya98/JAMUNIf you want to test out your own trained model,
either specify the wandb_train_run_path (in the form entity/project/run_id, which can be obtained from the Overview tab in the Weights and Biases UI for your training run), or the checkpoint_dir of the trained model.
jamun_sample ... ++wandb_train_run_path=[WANDB_TRAIN_RUN_PATH]
jamun_sample ... ++checkpoint_dir=[CHECKPOINT_DIR]If you want to sample conformations for a particular peptide sequence, you need to first generate a .pdb file.
We provide a script that uses AmberTools, specifically tleap. If you have a .pdb file already, then you can skip this step.
Run:
python scripts/prepare_pdb.py [SEQUENCE] --mode [MODE] --outputdir [OUTPUTDIR]where SEQUENCE is your peptide sequence entered as a string of one-letter codes (eg. AGPF) or a string of hyphenated three letter codes (eg. ALA-GLY-PRO-PHE), MODE is either capped or uncapped to add capping ACE and NME residues, and OUTPUTDIR is where your generated .pdb file will be saved (default is current directory).
The script will print out the path to the generated .pdb file, INIT_PDB.
Run the sampling script, starting from the provided .pdb structure:
jamun_sample --config-dir=configs experiment=sample_custom ++init_pdb=[INIT_PDB]We also provide some configs to sample from the uncapped 2AA and 4AA peptides from the test set in Timewarp.
jamun_sample --config-dir=configs experiment=sample_uncapped_2AA.yaml checkpoint_dir=...
jamun_sample --config-dir=configs experiment=sample_uncapped_4AA.yaml checkpoint_dir=...We provide scripts for analysing JAMUN and original MD trajectories in [https://github.com/prescient-design/jamun/tree/main/analysis].
We provide scripts for generating MD simulation data with OpenMM, including energy minimization and calibration steps with NVT and NPT ensembles.
python scripts/MD/run_simulation.py [INIT_PDB]The defaults correspond to our setup for the capped diamines.
Please run this script with the -h flag to see all simulation parameters.
Some of the datasets require some preprocessing for easier consumption, for eg. the MDGen data:
source .env
python scripts/process_mdgen.py \
--input-dir ${JAMUN_DATA_PATH}/mdgen \
--output-dir ${JAMUN_DATA_PATH}/mdgen/data/4AA_sims_partitioned_chunkedIf you found this repository useful, please cite our preprint!
@misc{daigavane2024jamuntransferablemolecularconformational,
title={JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles},
author={Ameya Daigavane and Bodhi P. Vani and Darcy Davidson and Saeed Saremi and Joshua Rackers and Joseph Kleinhenz},
year={2024},
eprint={2410.14621},
archivePrefix={arXiv},
primaryClass={physics.bio-ph},
url={https://arxiv.org/abs/2410.14621},
}

