AiiDA-PyFLAME introduces an automated workflow for training neural network interatomic potentials (NNPs) with FLAME.
Python 3.6 or later is required.
The latest version of FLAME compatible with Python 3 is required
aiida-core, aiida-submission-controller, and aiida-cp2k/aiida-vasp should be installed.
To install the AiiDA-PyFLAME package directly from the cloned repository:
git clone https://github.com/hmhoseini/aiida-pyflame.git
cd aiida-pyflame
pip install -e .
The AiiDA-PyFLAME directory should be added to PYTHONPATH.
The directory structure of AiiDA-PyFLAME is as follows:
aiida-pyflame
├── aiida_pyflame
│ ├── codes
│ | ├── cp2k
│ | │ └── cp2k_files
│ | ├── flame
│ | │ ├── flame_files
│ | │ └── flame_functions
│ | └── vasp
│ | └── vasp_files
│ └── workflows
├── examples
├── run_dir
└── utils
The default output directory is run_dir but pyflame.py can be executed in any directory that contains the following files:
config.yaml
input.yaml
restart.yaml
It is necessary to modify these three files before running AiiDA-PyFLAME.
The following values should be specified in the config.py file:
| Values | Description |
|---|---|
| DFT_code_string | Code label for DFT calculations. Available codes and their identifiers can be listed with verdi code list |
| FLAME_code_string | Code label for FLAME calculations. Available codes and their identifiers can be listed with verdi code list |
| VASP_potential_family | Potential family for VASP calculation. Available families of VASP potcar files can be listed with verdi data vasp-potcar listfamilies |
| The following parameters should be specified for all jobs: | |
| number_of_jobs | The maximum number of jobs that will be submitted for an specifc job type. |
| nodes | Number of nodes to be allocate for the job. See slurm manual |
| ntasks | Number of tasks per node. See slurm manual |
| ncpu | Number of processors per task. See slurm manual |
| time | Maximum time for a job (in seconds). |
The data that should be provided in input.yaml is as follows:
| Key | Description |
|---|---|
| chemical_formula | A list of chemical formula(s) |
| from_db | Either False or a list of databases to retrieve atomic structures. List of known OPTIMADE providers: aflow, cod, mcloud.mc3d, mcloud.mc2d, mcloud.2dtopo, mcloud.tc-applicability, mcloud.pyrene-mofs, mcloud.curated-cofs, mcloud.stoceriaitf, mcloud.scdm, mcloud.tin-antimony-sulfoiodide, mcloud.optimade-sample, mp, mpds, nmd, odbx, odbx.odbx_misc, omdb.omdb_production, oqmd, jarvis, tcod, twodmatpedia. |
| from_local_db | If True, atomic structures will be retrieved from run_dir/local_db/known_bulk_structures.json (a list of dict representation pymatgen Structures) |
| bulk_number_of_atoms | Specifies number of atoms in bulk structures |
| max_number_of_bulk_structures | Specifies number of structures sent for ab-initio calculations |
| reference_number_of_atoms | Specifies number of atoms in reference structures. Should be a subset of bulk_number_of_atoms. The structures are optimized with tight criteria and if large number of atoms is give, then it could be time consuming. |
| max_number_of_reference_structures | Specifies number of reference structures |
| cluster_calculation | If cluster structures should be included in the training |
| cluster_number_of_atoms | Number of atoms on cluster structures. Should be a subset of bulk_number_of_atoms |
| box_size | Size of the box for clusters in Angstrom |
| vacuum_length | Minimum length of vacuum for each supercell containing a cluster |
| min_distance_prefactor | The allowed distance between atoms is calculated as the sum of their covalent radii. The allowed distance can be tuned by this prefactor. |
| descending_prefactor | Either False or a number to specify the percentage of descending min_distance_prefactor in each cycle of training |
| energy_window | The maximum value of energy for training data. |
| method | behler |
| number_of_nodes | Number of nodes in the hidden layer of the NN for each cycle of training. |
| number_of_epoch | Number of epoch for each cycle of training. |
| minimahopping_time | Minimum and maximum time for minima hopping jobs (in hours) |
| minhocao_steps | Maximum number of minhocao steps |
| bulk_minhocao | Maximum number of minhocao jobs (minima hopping for bulk structures with variable cell) |
| minhopp_steps | Maximum number of minhopp steps |
| bulk_minhopp | Maximum number of minhopp jobs for bulk structures (minima hopping for bulk structures with fixed cell) |
| cluster_minhopp | Maximum number of minhopp jobs for clusters |
| dtol_prefactor | Prefactor for structure diversity check. The larger the value is, the more structures are considered similar (removed from the list). |
| prefactor_cluster | A prefactor for dtol_prefactor to be employed for clusters |
| ab_initio_code | CP2K_GTH or SIRIUS_CP2K or VASP |
| user_specified_CP2K_files | If True, then codes/cp2k/cp2k_files folder should be copied into the run directory. The user can modify CP2K keywords (protocol files and/or pseudopotentials). |
| user_specified_FLAME_file | If True, then codes/flame/flame_files folder should be copied into the run directory. The user can modify FLAME keywords (protocol file). |
| user_specified_VASP_file | If True, then codes/vasp/vasp_files folder should be copied into the run directory. The user can modify VASP keywords (protocol file and/or potential_mapping). |
The step from which AiiDA-PyFLAME (re)starts is specified in restart.yaml. AiiDA-PyFLAME keeps track of its steps. If a failure occurs and AiiDA-PyFLAME cannot advance, the user can restart AiiDA-PyFLAME from the last successfully accomplished step. It is noted that users can always restart AiiDA-PyFLAME from a previous step.
The following parameters can be specified in restart.yaml:
| Key | Description |
|---|---|
| re-start_from_step | Specifies the step the script will (re)start. -1: run unfinished jobs 1: start AiiDA-PyFLAME 2: random bulk structure generation 3: initial ab-initio calculations 4: FLAME trainin loop |
| stop_after_step | The script stops after the end of this step. -1 for non-stop run. |
| training_loop_start | Specifies the cycle number and cycle name for the training cycle. The cycle names are as follows: - train (training the NN) - minimahopping (minima hopping for bulk structures and clusters) - divcheck (structure diversity check) - SP_calculations (ab-initio single-point calculations) |
| training_loop_stop | Specifies the cycle number and cycle name to stop the training cycle. |
When the above-mentioned files are ready, the script can be executed by running pyflame.py command.
The structure of the output directory is as follows:
output
├── cycle-1
│ ├── minimahopping
│ └── train
.
.
.
└── cycle-n
├── minimahopping
└── train
A detailed log file (pyflame.log) will be written in the output directory. Folders and files inside cycle-? folders provide details of training process.
MIT