Skip to content

QUIP/LAMMPS different energies for different number of cores #669

@robertstella111

Description

@robertstella111

Hello,

I am attempting to create a GAP potential for silicon and carbon. I have installed the QUIP library with the architecture linux_x86_64_gfortran. For the training of the GAP I am using the following command

gap_fit energy_parameter_name=energy force_parameter_name=forces do_copy_at_file=F sparse_separate_file=T gp_file=GAP.xml at_file=train.xyz gap={distance_Nb order=2 cutoff=4.5 n_sparse=15 covariance_type=ard_se delta=5 theta_uniform=2.0 sparse_method=uniform compact_clusters=T : distance_Nb order=3 cutoff=3.5 n_sparse=200 covariance_type=ard_se delta=0.3 theta_uniform=2.0 sparse_method=uniform compact_clusters=T : soap atom_sigma=0.5 l_max=8 n_max=8 cutoff=4.5 cutoff_transition_width=1.0 delta=0.1 covariance_type=dot_product n_sparse=2000 zeta=4} default_sigma={0.005 0.08 0.0 0.0} config_type_sigma={dimer:0.0008:0.02:0.0:0.0:single:0.0001:0.005:0.0:0.0:perfect_crystall:0.00001:0.01:0.0:0.0:interstitials:0.0002:0.05:0.0:0.0} sparse_jitter=1e-8 gp_file=GAP.xml

I now want to run bigger simulations in lammps. For lammps I tried several versions (lammps_stable_sep_2021, stable_5Jun2019,stable_29Aug2024) and installed the QUIP pair_style by using cmake directly and also by referring to the libquip.a library.

When I now run lammps with mpi on mulitple cores and on a single core the results for the energies are quite different. For a cell of 385 atoms, the energy difference is around 3eV. The command for the pair_style looks as follows

pair_coeff * * GAP/29_08/without_stress/GAP.xml "Potential xml_label=GAP_2024_8_30_120_8_42_16_201" 14 6

If I now change all silicon atoms to carbon or all carbon to silicon the results for the mpi and single core runs are the same. I also encountered the problem for bigger cells and smaller cells, with defects and without defects. The problem with different results for different numbers of cores is also not present for all GAP potentials I have trained so far. The one above is just an example of where the error occurs.

Does someone have an idea where the error comes from and if this will influence the results I get from longer and more realistic runs (physical wise)?

log_serial.txt
log_parallel.txt
training.txt
h_110_1.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions