JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles

This is the official implementation of the paper JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles, accepted at NeurIPS 2025.

Summary

Conformational ensembles of protein structures are immensely important both for understanding protein function and drug discovery in novel modalities such as cryptic pockets. Current techniques for sampling ensembles such as molecular dynamics (MD) are computationally inefficient, while many recent machine learning methods do not generalize well outside their training data. We propose JAMUN which performs MD in a smoothed, noised space of all-atom 3D conformations of molecules by utilizing the framework of walk-jump sampling. JAMUN enables ensemble generation for small peptides at rates of an order of magnitude faster than traditional molecular dynamics. The physical priors in JAMUN enables transferability to systems outside of its training data, even to peptides that are longer than those originally trained on.

An overview of the walk-jump sampling scheme, which is similar to classical molecular dynamics, but in a smoothed space:

TICA-0,1 projections on unseen 5AA peptides:

Setup

Clone the repository with HTTPS:

git clone https://github.com/prescient-design/jamun.git

or SSH:

git clone [email protected]:prescient-design/jamun.git

Navigate to the cloned repository:

cd jamun

We recommend creating a mamba or conda environment. This is because certain dependencies are tricky to install directly.

conda create --name jamun python=3.11 -y
conda activate jamun
conda install -c conda-forge ambertools=23 openmm pdbfixer pyemma -y
conda install pulchra -c bioconda -y

The remaining dependencies can be installed via pip or uv (recommended).

uv pip install -e .[dev]

Data

The uncapped 2AA data from Timewarp can be obtained from Hugging Face.

cd /path/to/data/root/
git lfs install
git clone https://huggingface.co/datasets/microsoft/timewarp

where /path/to/data/root/ is the path where you want to store the datasets.

This should be your directory structure:

/path/to/data/root/
└── timewarp/
    ├── 2AA-1-big/
    │   └── ...
    ├── 2AA-1-large/
    │   └── ...

Now, set the environment variable JAMUN_DATA_PATH:

export JAMUN_DATA_PATH=/path/to/data/root/

or, create a .env file in the root of the repository and set JAMUN_DATA_PATH:

JAMUN_DATA_PATH=/path/to/data/root/

Set the environment variable JAMUN_ROOT_PATH (default: current directory) to specify where outputs from training and sampling are saved:

export JAMUN_ROOT_PATH=...

or in the .env file in the root of the repository:

JAMUN_ROOT_PATH=...

Training

Once you have downloaded the data and set the appropriate variables correctly, you can start training on Timewarp.

We recommend first running our test config (on one GPU) to check that installation was successful:

CUDA_VISIBLE_DEVICES=0 jamun_train --config-dir=configs experiment=train_test.yaml

Then, you can train on the uncapped 2AA peptides dataset:

jamun_train --config-dir=configs experiment=train_uncapped_2AA.yaml

or the uncapped 4AA peptides dataset:

jamun_train --config-dir=configs experiment=train_uncapped_4AA.yaml

We also provide example SLURM launcher scripts for training and sampling on SLURM clusters:

sbatch scripts/slurm/train.sh
sbatch scripts/slurm/sample.sh

Inference

Loading Trained Models

We provide trained models (for both sampling, and restarting training) for Timewarp 2AA, Timewarp 4AA, MDGen 4AA and other datasets at Hugging Face. Unfortunately, some of these checkpoints were from an older version of this code. If you wish to run sampling with these checkpoints, we have made an old-checkpoints branch for compatibility:

git switch old-checkpoints

Then, clone the checkpoints repository:

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/ameya98/JAMUN

If you want to test out your own trained model, either specify the wandb_train_run_path (in the form entity/project/run_id, which can be obtained from the Overview tab in the Weights and Biases UI for your training run), or the checkpoint_dir of the trained model.

jamun_sample ... ++wandb_train_run_path=[WANDB_TRAIN_RUN_PATH]
jamun_sample ... ++checkpoint_dir=[CHECKPOINT_DIR]

Sampling Conformations for a Peptide Sequence

If you want to sample conformations for a particular peptide sequence, you need to first generate a .pdb file.

We provide a script that uses AmberTools, specifically tleap. If you have a .pdb file already, then you can skip this step.

Generate `.pdb` file

Run:

python scripts/prepare_pdb.py [SEQUENCE] --mode [MODE] --outputdir [OUTPUTDIR]

where SEQUENCE is your peptide sequence entered as a string of one-letter codes (eg. AGPF) or a string of hyphenated three letter codes (eg. ALA-GLY-PRO-PHE), MODE is either capped or uncapped to add capping ACE and NME residues, and OUTPUTDIR is where your generated .pdb file will be saved (default is current directory). The script will print out the path to the generated .pdb file, INIT_PDB.

Run sampling on `.pdb`

Run the sampling script, starting from the provided .pdb structure:

jamun_sample --config-dir=configs experiment=sample_custom ++init_pdb=[INIT_PDB]

Sampling Test Peptides from Timewarp

We also provide some configs to sample from the uncapped 2AA and 4AA peptides from the test set in Timewarp.

jamun_sample --config-dir=configs experiment=sample_uncapped_2AA.yaml checkpoint_dir=...

jamun_sample --config-dir=configs experiment=sample_uncapped_4AA.yaml checkpoint_dir=...

Analysis

We provide scripts for analysing JAMUN and original MD trajectories in [https://github.com/prescient-design/jamun/tree/main/analysis].

Data Generation

Running Molecular Dynamics with OpenMM

We provide scripts for generating MD simulation data with OpenMM, including energy minimization and calibration steps with NVT and NPT ensembles.

python scripts/MD/run_simulation.py [INIT_PDB]

The defaults correspond to our setup for the capped diamines. Please run this script with the -h flag to see all simulation parameters.

Preprocessing

Some of the datasets require some preprocessing for easier consumption, for eg. the MDGen data:

source .env
python scripts/process_mdgen.py \
  --input-dir ${JAMUN_DATA_PATH}/mdgen \
  --output-dir ${JAMUN_DATA_PATH}/mdgen/data/4AA_sims_partitioned_chunked

Citation

If you found this repository useful, please cite our preprint!

@misc{daigavane2024jamuntransferablemolecularconformational,
      title={JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles},
      author={Ameya Daigavane and Bodhi P. Vani and Darcy Davidson and Saeed Saremi and Joshua Rackers and Joseph Kleinhenz},
      year={2024},
      eprint={2410.14621},
      archivePrefix={arXiv},
      primaryClass={physics.bio-ph},
      url={https://arxiv.org/abs/2410.14621},
}

Name		Name	Last commit message	Last commit date
Latest commit History 490 Commits
.github/workflows		.github/workflows
analysis		analysis
configs/experiment		configs/experiment
figures		figures
profiling		profiling
scripts		scripts
src/jamun		src/jamun
tests/jamun		tests/jamun
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles

Summary

Setup

Data

Training

Inference

Loading Trained Models

Sampling Conformations for a Peptide Sequence

Generate `.pdb` file

Run sampling on `.pdb`

Sampling Test Peptides from Timewarp

Analysis

Data Generation

Running Molecular Dynamics with OpenMM

Preprocessing

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

License

prescient-design/jamun

Folders and files

Latest commit

History

Repository files navigation

JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles

Summary

Setup

Data

Training

Inference

Loading Trained Models

Sampling Conformations for a Peptide Sequence

Generate .pdb file

Run sampling on .pdb

Sampling Test Peptides from Timewarp

Analysis

Data Generation

Running Molecular Dynamics with OpenMM

Preprocessing

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Generate `.pdb` file

Run sampling on `.pdb`

Packages