Skip to content

Commit da45208

Browse files
committed
Initial commit
0 parents  commit da45208

40 files changed

+2677
-0
lines changed

.gitignore

Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
share/python-wheels/
24+
*.egg-info/
25+
.installed.cfg
26+
*.egg
27+
MANIFEST
28+
29+
# PyInstaller
30+
# Usually these files are written by a python script from a template
31+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
32+
*.manifest
33+
*.spec
34+
35+
# Installer logs
36+
pip-log.txt
37+
pip-delete-this-directory.txt
38+
39+
# Unit test / coverage reports
40+
htmlcov/
41+
.tox/
42+
.nox/
43+
.coverage
44+
.coverage.*
45+
.cache
46+
nosetests.xml
47+
coverage.xml
48+
*.cover
49+
*.py,cover
50+
.hypothesis/
51+
.pytest_cache/
52+
cover/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
.pybuilder/
76+
target/
77+
78+
# Jupyter Notebook
79+
.ipynb_checkpoints
80+
81+
# IPython
82+
profile_default/
83+
ipython_config.py
84+
85+
# pyenv
86+
# For a library or package, you might want to ignore these files since the code is
87+
# intended to run in multiple environments; otherwise, check them in:
88+
# .python-version
89+
90+
# pipenv
91+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
93+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
94+
# install all needed dependencies.
95+
#Pipfile.lock
96+
97+
# poetry
98+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
99+
# This is especially recommended for binary packages to ensure reproducibility, and is more
100+
# commonly ignored for libraries.
101+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
102+
#poetry.lock
103+
104+
# pdm
105+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
106+
#pdm.lock
107+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108+
# in version control.
109+
# https://pdm.fming.dev/#use-with-ide
110+
.pdm.toml
111+
112+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
113+
__pypackages__/
114+
115+
# Celery stuff
116+
celerybeat-schedule
117+
celerybeat.pid
118+
119+
# SageMath parsed files
120+
*.sage.py
121+
122+
# Environments
123+
.env
124+
.venv
125+
env/
126+
venv/
127+
ENV/
128+
env.bak/
129+
venv.bak/
130+
131+
# Spyder project settings
132+
.spyderproject
133+
.spyproject
134+
135+
# Rope project settings
136+
.ropeproject
137+
138+
# mkdocs documentation
139+
/site
140+
141+
# mypy
142+
.mypy_cache/
143+
.dmypy.json
144+
dmypy.json
145+
146+
# Pyre type checker
147+
.pyre/
148+
149+
# pytype static type analyzer
150+
.pytype/
151+
152+
# Cython debug symbols
153+
cython_debug/
154+
155+
# PyCharm
156+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
157+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
158+
# and can be added to the global gitignore or merged into this file. For a more nuclear
159+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
160+
#.idea/

.pre-commit-config.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: v4.6.0
4+
hooks:
5+
- id: check-yaml
6+
- id: debug-statements
7+
- id: end-of-file-fixer
8+
- id: trailing-whitespace
9+
- repo: https://github.com/astral-sh/ruff-pre-commit
10+
rev: v0.4.2
11+
hooks:
12+
- id: ruff
13+
types_or: [ python, pyi, jupyter ]
14+
args: [ --fix ]
15+
- id: ruff-format
16+
types_or: [ python, pyi, jupyter ]

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 Learning Matter @ MIT
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
# LiFlow
2+
3+
This repository implements a generative framework to accelerate molecular dynamics simulations for crystalline materials.
4+
We enable the propagation of atomic configurations in time by learning a distribution of displacements from a set of reference trajectories.
5+
The details of the method are described in the paper: [Flow Matching for Accelerated Simulation of Atomic Transport in Materials](https://arxiv.org/abs/2410.01464).
6+
7+
<p align="center">
8+
<img src="figs/LGPS.gif" alt="LGPS traj" style="width: 70%;">
9+
</p>
10+
<p align="center">
11+
<img src="figs/scheme.png" alt="LiFlow scheme" style="width: 90%;">
12+
</p>
13+
14+
## Setup
15+
16+
Clone the repository, create a new environment and install the required packages:
17+
18+
```bash
19+
# Clone the repository
20+
git clone https://github.com/learningmatter-mit/liflow.git
21+
22+
# Create conda environment
23+
conda create -n liflow python=3.11
24+
conda activate liflow
25+
26+
# Install torch (change the CUDA version if needed)
27+
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124
28+
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.5.1+cu124.html
29+
30+
# Install liflow
31+
pip install -e .
32+
# pip install -e '.[dev]' # additional packages for development
33+
```
34+
35+
## Usage
36+
37+
This section provides a brief overview of the training and evaluation process.
38+
We assume that the dataset is stored in the `data/` directory, and the scripts are executed from the root directory of the repository.
39+
40+
### Dataset
41+
42+
To reproduce the results in the paper, download the dataset from [here](https://doi.org/10.5281/zenodo.14889658) and extract it to the `data/` directory.
43+
44+
```bash
45+
mkdir data
46+
tar -xvf data.tar.gz -C data
47+
```
48+
49+
We provide the datasets for the universal MLIP set and LGPS dataset.
50+
LGPS trajectories are obtained from XDATCAR files provided in the [Inorganic Solid State Electrolytes Database](https://superionic.upc.edu).
51+
LPS dataset are obtained from the authors of [[Jun et al., 2024]](https://www.pnas.org/doi/10.1073/pnas.2316493121), and are available upon request.
52+
53+
The data directories contain the following files:
54+
55+
| File | Description |
56+
|------|-------------|
57+
| `element_index.npy` | Element indices for the atomic species `[n_elements,]` |
58+
| `atomic_numbers.npy` | Atomic numbers for atoms in the structures, dictionary of `[n_atoms,]` int arrays indexed by `name` |
59+
| `lattice.npy` | Lattice matrix for the structures, dictionary of `[3, 3]` float arrays indexed by `name` |
60+
| `positions_{temp}K.npz` | Atomic positions for the structures at the specified temperature, dictionary of `[n_frames, n_atoms, 3]` float arrays indexed by `name` |
61+
| `{train,test}_{temp}K.csv` | Index CSV files for the training and testing trajectories (see below) |
62+
63+
### Training
64+
65+
The CSV files in the dataset contain the necessary information to load the trajectories.
66+
The columns are as follows:
67+
68+
| Column | Description |
69+
|--------|-------------|
70+
| `name` | Identifier of the structure |
71+
| `temp` | Temperature of the trajectory |
72+
| `t_start` | Starting time of the trajectory |
73+
| `t_end` | Ending time of the trajectory |
74+
| `comp` | Composition of the structure (used to split and sample the dataset) |
75+
| `msd_t_Li` | MSD/time for lithium atoms in Ų/ps (train split for universal MLIP set) |
76+
| `msd_t_frame` | MSD/time for frame atoms in Ų/ps (train split for universal MLIP set) |
77+
| `prior_Li` | Prior label (0 or 1) for lithium atoms |
78+
| `prior_frame` | Prior label (0 or 1) for frame atoms |
79+
80+
For the universal MLIP set, `prior_Li` and `prior_frame` labels are obtained by training a classifier based on the MSD values of the training set.
81+
Please refer to the notebook `notebooks/prior_classifier.ipynb` for the details.
82+
For the LGPS and LPS datasets, the prior labels are annotated based on the MSD values from the short training trajectories.
83+
84+
Training scripts are provided in the `scripts/` directory, and the training is performed using `liflow.experiment.train` module.
85+
The arguments are specified by the hydra configuration file `liflow/configs/train.yaml`, and can be overridden from the command line as in the provided examples.
86+
The important arguments are:
87+
88+
| Argument | Description |
89+
|----------|-------------|
90+
| `task` | Training task (propagate or correct) |
91+
| `name` | Name of the experiment, checkpoints will use this name |
92+
| `data.data_path` | Path to the dataset |
93+
| `data.index_files` | List of index CSV files to load the trajectories |
94+
| `data.train_valid_split` | Whether to split the validation set from the training set (True for universal set, False for LGPS and LPS) |
95+
| `data.sample_weight_comp` | Whether to sample the dataset inversely proportional to the composition count (to sample over compositions uniformly) |
96+
| `data.in_memory` | Whether to load the dataset in memory (useful for small datasets) |
97+
| `propagate_prior.params.scale` | Scale hyperparameters for the propagator prior (`[[Li_small, Li_large], [frame_small, frame_large]]`) |
98+
| `correct_noise.params.scale` | Corrector noise scale |
99+
100+
We provide the trained model checkpoints in the `checkpoints/` directory.
101+
The checkpoints are named as `{P,C}_{dataset}.ckpt`, where `P` and `C` denote the propagator and corrector models, respectively.
102+
LGPS corrector models are trained with different noise scales (0.1 and 0.2), and the checkpoints are named as `C_LGPS_{0.1,0.2}.ckpt`.
103+
104+
### Testing
105+
106+
The testing scripts are also provided in the `scripts/` directory.
107+
The testing for the universal MLIP set is performed using `liflow.experiment.test` module, and generates a CSV file with the metrics reported in the paper.
108+
109+
To generate the trajectories for the LGPS and LPS datasets, we wrote a standalone script to convert the output positions into a xyz file.
110+
Example for the LGPS dataset is provided in `scripts/test_LGPS.py`.
111+
The script will read the checkpoint file and initial structure from the dataset (e.g., POSCAR file for LGPS), and generate the trajectories at the specified temperature as a xyz file in the output directory.
112+
113+
## Citation
114+
115+
```bibtex
116+
@misc{nam2024flow,
117+
title={Flow Matching for Accelerated Simulation of Atomic Transport in Materials},
118+
author={Juno Nam and Sulin Liu and Gavin Winter and KyuJung Jun and Soojung Yang and Rafael G{\'o}mez-Bombarelli},
119+
year={2024},
120+
eprint={2410.01464},
121+
archivePrefix={arXiv},
122+
primaryClass={cond-mat.mtrl-sci},
123+
url={https://arxiv.org/abs/2410.01464},
124+
}
125+
```

ckpt/C_LGPS_0.1.ckpt

2.07 MB
Binary file not shown.

ckpt/C_LGPS_0.2.ckpt

2.07 MB
Binary file not shown.

ckpt/C_LPS.ckpt

2.07 MB
Binary file not shown.

ckpt/C_universal.ckpt

2.06 MB
Binary file not shown.

ckpt/P_LGPS.ckpt

2.07 MB
Binary file not shown.

ckpt/P_LPS.ckpt

2.07 MB
Binary file not shown.

0 commit comments

Comments
 (0)