|
| 1 | +# LiFlow |
| 2 | + |
| 3 | +This repository implements a generative framework to accelerate molecular dynamics simulations for crystalline materials. |
| 4 | +We enable the propagation of atomic configurations in time by learning a distribution of displacements from a set of reference trajectories. |
| 5 | +The details of the method are described in the paper: [Flow Matching for Accelerated Simulation of Atomic Transport in Materials](https://arxiv.org/abs/2410.01464). |
| 6 | + |
| 7 | +<p align="center"> |
| 8 | +<img src="figs/LGPS.gif" alt="LGPS traj" style="width: 70%;"> |
| 9 | +</p> |
| 10 | +<p align="center"> |
| 11 | +<img src="figs/scheme.png" alt="LiFlow scheme" style="width: 90%;"> |
| 12 | +</p> |
| 13 | + |
| 14 | +## Setup |
| 15 | + |
| 16 | +Clone the repository, create a new environment and install the required packages: |
| 17 | + |
| 18 | +```bash |
| 19 | +# Clone the repository |
| 20 | +git clone https://github.com/learningmatter-mit/liflow.git |
| 21 | + |
| 22 | +# Create conda environment |
| 23 | +conda create -n liflow python=3.11 |
| 24 | +conda activate liflow |
| 25 | + |
| 26 | +# Install torch (change the CUDA version if needed) |
| 27 | +pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124 |
| 28 | +pip install torch-scatter -f https://data.pyg.org/whl/torch-2.5.1+cu124.html |
| 29 | + |
| 30 | +# Install liflow |
| 31 | +pip install -e . |
| 32 | +# pip install -e '.[dev]' # additional packages for development |
| 33 | +``` |
| 34 | + |
| 35 | +## Usage |
| 36 | + |
| 37 | +This section provides a brief overview of the training and evaluation process. |
| 38 | +We assume that the dataset is stored in the `data/` directory, and the scripts are executed from the root directory of the repository. |
| 39 | + |
| 40 | +### Dataset |
| 41 | + |
| 42 | +To reproduce the results in the paper, download the dataset from [here](https://doi.org/10.5281/zenodo.14889658) and extract it to the `data/` directory. |
| 43 | + |
| 44 | +```bash |
| 45 | +mkdir data |
| 46 | +tar -xvf data.tar.gz -C data |
| 47 | +``` |
| 48 | + |
| 49 | +We provide the datasets for the universal MLIP set and LGPS dataset. |
| 50 | +LGPS trajectories are obtained from XDATCAR files provided in the [Inorganic Solid State Electrolytes Database](https://superionic.upc.edu). |
| 51 | +LPS dataset are obtained from the authors of [[Jun et al., 2024]](https://www.pnas.org/doi/10.1073/pnas.2316493121), and are available upon request. |
| 52 | + |
| 53 | +The data directories contain the following files: |
| 54 | + |
| 55 | +| File | Description | |
| 56 | +|------|-------------| |
| 57 | +| `element_index.npy` | Element indices for the atomic species `[n_elements,]` | |
| 58 | +| `atomic_numbers.npy` | Atomic numbers for atoms in the structures, dictionary of `[n_atoms,]` int arrays indexed by `name` | |
| 59 | +| `lattice.npy` | Lattice matrix for the structures, dictionary of `[3, 3]` float arrays indexed by `name` | |
| 60 | +| `positions_{temp}K.npz` | Atomic positions for the structures at the specified temperature, dictionary of `[n_frames, n_atoms, 3]` float arrays indexed by `name` | |
| 61 | +| `{train,test}_{temp}K.csv` | Index CSV files for the training and testing trajectories (see below) | |
| 62 | + |
| 63 | +### Training |
| 64 | + |
| 65 | +The CSV files in the dataset contain the necessary information to load the trajectories. |
| 66 | +The columns are as follows: |
| 67 | + |
| 68 | +| Column | Description | |
| 69 | +|--------|-------------| |
| 70 | +| `name` | Identifier of the structure | |
| 71 | +| `temp` | Temperature of the trajectory | |
| 72 | +| `t_start` | Starting time of the trajectory | |
| 73 | +| `t_end` | Ending time of the trajectory | |
| 74 | +| `comp` | Composition of the structure (used to split and sample the dataset) | |
| 75 | +| `msd_t_Li` | MSD/time for lithium atoms in Ų/ps (train split for universal MLIP set) | |
| 76 | +| `msd_t_frame` | MSD/time for frame atoms in Ų/ps (train split for universal MLIP set) | |
| 77 | +| `prior_Li` | Prior label (0 or 1) for lithium atoms | |
| 78 | +| `prior_frame` | Prior label (0 or 1) for frame atoms | |
| 79 | + |
| 80 | +For the universal MLIP set, `prior_Li` and `prior_frame` labels are obtained by training a classifier based on the MSD values of the training set. |
| 81 | +Please refer to the notebook `notebooks/prior_classifier.ipynb` for the details. |
| 82 | +For the LGPS and LPS datasets, the prior labels are annotated based on the MSD values from the short training trajectories. |
| 83 | + |
| 84 | +Training scripts are provided in the `scripts/` directory, and the training is performed using `liflow.experiment.train` module. |
| 85 | +The arguments are specified by the hydra configuration file `liflow/configs/train.yaml`, and can be overridden from the command line as in the provided examples. |
| 86 | +The important arguments are: |
| 87 | + |
| 88 | +| Argument | Description | |
| 89 | +|----------|-------------| |
| 90 | +| `task` | Training task (propagate or correct) | |
| 91 | +| `name` | Name of the experiment, checkpoints will use this name | |
| 92 | +| `data.data_path` | Path to the dataset | |
| 93 | +| `data.index_files` | List of index CSV files to load the trajectories | |
| 94 | +| `data.train_valid_split` | Whether to split the validation set from the training set (True for universal set, False for LGPS and LPS) | |
| 95 | +| `data.sample_weight_comp` | Whether to sample the dataset inversely proportional to the composition count (to sample over compositions uniformly) | |
| 96 | +| `data.in_memory` | Whether to load the dataset in memory (useful for small datasets) | |
| 97 | +| `propagate_prior.params.scale` | Scale hyperparameters for the propagator prior (`[[Li_small, Li_large], [frame_small, frame_large]]`) | |
| 98 | +| `correct_noise.params.scale` | Corrector noise scale | |
| 99 | + |
| 100 | +We provide the trained model checkpoints in the `checkpoints/` directory. |
| 101 | +The checkpoints are named as `{P,C}_{dataset}.ckpt`, where `P` and `C` denote the propagator and corrector models, respectively. |
| 102 | +LGPS corrector models are trained with different noise scales (0.1 and 0.2), and the checkpoints are named as `C_LGPS_{0.1,0.2}.ckpt`. |
| 103 | + |
| 104 | +### Testing |
| 105 | + |
| 106 | +The testing scripts are also provided in the `scripts/` directory. |
| 107 | +The testing for the universal MLIP set is performed using `liflow.experiment.test` module, and generates a CSV file with the metrics reported in the paper. |
| 108 | + |
| 109 | +To generate the trajectories for the LGPS and LPS datasets, we wrote a standalone script to convert the output positions into a xyz file. |
| 110 | +Example for the LGPS dataset is provided in `scripts/test_LGPS.py`. |
| 111 | +The script will read the checkpoint file and initial structure from the dataset (e.g., POSCAR file for LGPS), and generate the trajectories at the specified temperature as a xyz file in the output directory. |
| 112 | + |
| 113 | +## Citation |
| 114 | + |
| 115 | +```bibtex |
| 116 | +@misc{nam2024flow, |
| 117 | + title={Flow Matching for Accelerated Simulation of Atomic Transport in Materials}, |
| 118 | + author={Juno Nam and Sulin Liu and Gavin Winter and KyuJung Jun and Soojung Yang and Rafael G{\'o}mez-Bombarelli}, |
| 119 | + year={2024}, |
| 120 | + eprint={2410.01464}, |
| 121 | + archivePrefix={arXiv}, |
| 122 | + primaryClass={cond-mat.mtrl-sci}, |
| 123 | + url={https://arxiv.org/abs/2410.01464}, |
| 124 | +} |
| 125 | +``` |
0 commit comments