learningmatter-mit
diff --git a/‎.gitignore‎
Lines changed: 160 additions & 0 deletions b/‎.gitignore‎
Lines changed: 160 additions & 0 deletions
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 16 additions & 0 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 16 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 125 additions & 0 deletions b/‎README.md‎
Lines changed: 125 additions & 0 deletions
diff --git a/‎ckpt/C_LGPS_0.1.ckpt‎
2.07 MB b/‎ckpt/C_LGPS_0.1.ckpt‎
2.07 MB
diff --git a/‎ckpt/C_LGPS_0.2.ckpt‎
2.07 MB b/‎ckpt/C_LGPS_0.2.ckpt‎
2.07 MB
diff --git a/‎ckpt/C_LPS.ckpt‎
2.07 MB b/‎ckpt/C_LPS.ckpt‎
2.07 MB
diff --git a/‎ckpt/C_universal.ckpt‎
2.06 MB b/‎ckpt/C_universal.ckpt‎
2.06 MB
diff --git a/‎ckpt/P_LGPS.ckpt‎
2.07 MB b/‎ckpt/P_LGPS.ckpt‎
2.07 MB
diff --git a/‎ckpt/P_LPS.ckpt‎
2.07 MB b/‎ckpt/P_LPS.ckpt‎
2.07 MB
@@ -0,0 +1,160 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/
@@ -0,0 +1,16 @@
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.6.0
+    hooks:
+      - id: check-yaml
+      - id: debug-statements
+      - id: end-of-file-fixer
+      - id: trailing-whitespace
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.4.2
+    hooks:
+      - id: ruff
+        types_or: [ python, pyi, jupyter ]
+        args: [ --fix ]
+      - id: ruff-format
+        types_or: [ python, pyi, jupyter ]
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 Learning Matter @ MIT
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,125 @@
+# LiFlow
+
+This repository implements a generative framework to accelerate molecular dynamics simulations for crystalline materials.
+We enable the propagation of atomic configurations in time by learning a distribution of displacements from a set of reference trajectories.
+The details of the method are described in the paper: [Flow Matching for Accelerated Simulation of Atomic Transport in Materials](https://arxiv.org/abs/2410.01464).
+
+<p align="center">
+<img src="figs/LGPS.gif" alt="LGPS traj" style="width: 70%;">
+</p>
+<p align="center">
+<img src="figs/scheme.png" alt="LiFlow scheme" style="width: 90%;">
+</p>
+
+## Setup
+
+Clone the repository, create a new environment and install the required packages:
+
+```bash
+# Clone the repository
+git clone https://github.com/learningmatter-mit/liflow.git
+
+# Create conda environment
+conda create -n liflow python=3.11
+conda activate liflow
+
+# Install torch (change the CUDA version if needed)
+pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124
+pip install torch-scatter -f https://data.pyg.org/whl/torch-2.5.1+cu124.html
+
+# Install liflow
+pip install -e .
+# pip install -e '.[dev]'  # additional packages for development
+```
+
+## Usage
+
+This section provides a brief overview of the training and evaluation process.
+We assume that the dataset is stored in the `data/` directory, and the scripts are executed from the root directory of the repository.
+
+### Dataset
+
+To reproduce the results in the paper, download the dataset from [here](https://doi.org/10.5281/zenodo.14889658) and extract it to the `data/` directory.
+
+```bash
+mkdir data
+tar -xvf data.tar.gz -C data
+```
+
+We provide the datasets for the universal MLIP set and LGPS dataset.
+LGPS trajectories are obtained from XDATCAR files provided in the [Inorganic Solid State Electrolytes Database](https://superionic.upc.edu).
+LPS dataset are obtained from the authors of [[Jun et al., 2024]](https://www.pnas.org/doi/10.1073/pnas.2316493121), and are available upon request.
+
+The data directories contain the following files:
+
+| File | Description |
+|------|-------------|
+| `element_index.npy` | Element indices for the atomic species `[n_elements,]` |
+| `atomic_numbers.npy` | Atomic numbers for atoms in the structures, dictionary of `[n_atoms,]` int arrays indexed by `name` |
+| `lattice.npy` | Lattice matrix for the structures, dictionary of `[3, 3]` float arrays indexed by `name` |
+| `positions_{temp}K.npz` | Atomic positions for the structures at the specified temperature, dictionary of `[n_frames, n_atoms, 3]` float arrays indexed by `name` |
+| `{train,test}_{temp}K.csv` | Index CSV files for the training and testing trajectories (see below) |
+
+### Training
+
+The CSV files in the dataset contain the necessary information to load the trajectories.
+The columns are as follows:
+
+| Column | Description |
+|--------|-------------|
+| `name` | Identifier of the structure |
+| `temp` | Temperature of the trajectory |
+| `t_start` | Starting time of the trajectory |
+| `t_end` | Ending time of the trajectory |
+| `comp` | Composition of the structure (used to split and sample the dataset) |
+| `msd_t_Li` | MSD/time for lithium atoms in Å²/ps (train split for universal MLIP set) |
+| `msd_t_frame` | MSD/time for frame atoms in Å²/ps (train split for universal MLIP set) |
+| `prior_Li` | Prior label (0 or 1) for lithium atoms |
+| `prior_frame` | Prior label (0 or 1) for frame atoms |
+
+For the universal MLIP set, `prior_Li` and `prior_frame` labels are obtained by training a classifier based on the MSD values of the training set.
+Please refer to the notebook `notebooks/prior_classifier.ipynb` for the details.
+For the LGPS and LPS datasets, the prior labels are annotated based on the MSD values from the short training trajectories.
+
+Training scripts are provided in the `scripts/` directory, and the training is performed using `liflow.experiment.train` module.
+The arguments are specified by the hydra configuration file `liflow/configs/train.yaml`, and can be overridden from the command line as in the provided examples.
+The important arguments are:
+
+| Argument | Description |
+|----------|-------------|
+| `task` | Training task (propagate or correct) |
+| `name` | Name of the experiment, checkpoints will use this name |
+| `data.data_path` | Path to the dataset |
+| `data.index_files` | List of index CSV files to load the trajectories |
+| `data.train_valid_split` | Whether to split the validation set from the training set (True for universal set, False for LGPS and LPS) |
+| `data.sample_weight_comp` | Whether to sample the dataset inversely proportional to the composition count (to sample over compositions uniformly) |
+| `data.in_memory` | Whether to load the dataset in memory (useful for small datasets) |
+| `propagate_prior.params.scale` | Scale hyperparameters for the propagator prior (`[[Li_small, Li_large], [frame_small, frame_large]]`) |
+| `correct_noise.params.scale` | Corrector noise scale |
+
+We provide the trained model checkpoints in the `checkpoints/` directory.
+The checkpoints are named as `{P,C}_{dataset}.ckpt`, where `P` and `C` denote the propagator and corrector models, respectively.
+LGPS corrector models are trained with different noise scales (0.1 and 0.2), and the checkpoints are named as `C_LGPS_{0.1,0.2}.ckpt`.
+
+### Testing
+
+The testing scripts are also provided in the `scripts/` directory.
+The testing for the universal MLIP set is performed using `liflow.experiment.test` module, and generates a CSV file with the metrics reported in the paper.
+
+To generate the trajectories for the LGPS and LPS datasets, we wrote a standalone script to convert the output positions into a xyz file.
+Example for the LGPS dataset is provided in `scripts/test_LGPS.py`.
+The script will read the checkpoint file and initial structure from the dataset (e.g., POSCAR file for LGPS), and generate the trajectories at the specified temperature as a xyz file in the output directory.
+
+## Citation
+
+```bibtex
+@misc{nam2024flow,
+      title={Flow Matching for Accelerated Simulation of Atomic Transport in Materials},
+      author={Juno Nam and Sulin Liu and Gavin Winter and KyuJung Jun and Soojung Yang and Rafael G{\'o}mez-Bombarelli},
+      year={2024},
+      eprint={2410.01464},
+      archivePrefix={arXiv},
+      primaryClass={cond-mat.mtrl-sci},
+      url={https://arxiv.org/abs/2410.01464},
+}
+```