-
Notifications
You must be signed in to change notification settings - Fork 125
Description
Hi all,
I've been having issues trying to utilise the mpi version of GAP_fit. I compiled with the latest version of QUIP, as per the instructions provided on github (including the added steps for mpi). When I try to fit a potential, I get a segmentation fault during the calculation of the sparse covariance matrices:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x15288caa7d4f in ???
#1 0x15288caa7cbb in ???
#2 0x15288caa9354 in ???
#3 0x15288caedae6 in ???
#4 0xffffffffffffffff in ???
#0 0x1480ceb0cd4f in ???
#0 0x14ee75f4bd4f in ???
srun: error: nid005254: tasks 2,4,6,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62: Segmentation fault
srun: launch/slurm: _step_signal: Terminating StepId=8461484.0
Further to this, the total system memory doesn't seem to display the memory I would expect. For instance, when using 1 node with 256 GB of memory, the total system memory is 256. However, when running with e.g. 4 nodes, this number remains the same. I compiled on archer2 with the existing architecture file for archer2+openmp+openmpi.
The details of my GAP installation are below:
libAtoms::Hello World: 2025-01-07 17:36:39
libAtoms::Hello World: git version https://github.com/libAtoms/QUIP.git,v0.9.14-37-g61fbbd7bb-dirty
libAtoms::Hello World: QUIP_ARCH archer2_mpich+openmp
libAtoms::Hello World: compiled on Dec 19 2024 at 16:19:54
libAtoms::Hello World: MPI parallelisation with 192 processes
libAtoms::Hello World: OpenMP parallelisation with 2 threads
WARNING: libAtoms::Hello World: environment variable OMP_STACKSIZE not set explicitly. The default value - system and compiler dependent - may be too small for some applications.
libAtoms::Hello World: MPI run with the same seed on each process
libAtoms::Hello World: Random Seed = -712267007
libAtoms::Hello World: global verbosity = 0
My GAP input is as follows (I presume this is correct after the update to allow single run sparsification?):
# Input File
n='1'
infile="database.xyz"
outfile="gp${n}.xml"
# Gap-fit Settings
settings="sparse_jitter = 1e-8 \
default_sigma = {0.002 0.02 0.02 0.0} \
e0={Ba:-692.188:O:-430.076:Y:-1038.578:Cu:-1305.806} \
config_type_sigma={YBCO7:0.002:0.02:0.02:0.0:YBCO6:0.005:0.05:0.05:0.0:Y2O3:0.005:0.05:0.05:0.0:BaO:0.005:0.05:0.05:0.0:Cu2O:0.005:0.05:0.05:0.0:O2:0.005:0.05:0.05:0.0} \
core_param_file=pairpot.xml \
core_ip_args={IP Glue} \
energy_parameter_name=dft_energy \
force_parameter_name=dft_force "
# Two-body descriptors with gaussian kernel
k2b_params="cutoff = 5.0 \
cutoff_transition_width = 1.0 \
delta = 2.0 \
n_sparse = 20 \
sparse_method = uniform \
covariance_type = ARD_SE \
theta_uniform = 1.0"
k2b_Cu_Cu="distance_2b add_species = F \
Z1 = 29 \
Z2 = 29 \
${k2b_params}"
k2b_Cu_O="distance_2b add_species = F \
Z1 = 29 \
Z2 = 8 \
${k2b_params}"
k2b_Cu_Y="distance_2b add_species = F \
Z1 = 29 \
Z2 = 39 \
${k2b_params}"
k2b_Cu_Ba="distance_2b add_species = F \
Z1 = 29 \
Z2 = 56 \
${k2b_params}"
k2b_O_O="distance_2b add_species = F \
Z1 = 8 \
Z2 = 8 \
${k2b_params}"
k2b_O_Ba="distance_2b add_species = F \
Z1 = 8 \
Z2 = 56 \
${k2b_params}"
k2b_O_Y="distance_2b add_species = F \
Z1 = 8 \
Z2 = 39 \
${k2b_params}"
k2b_Y_Y="distance_2b add_species = F \
Z1 = 39 \
Z2 = 39 \
${k2b_params}"
k2b_Y_Ba="distance_2b add_species = F \
Z1 = 39 \
Z2 = 56 \
${k2b_params}"
k2b_Ba_Ba="distance_2b add_species = F \
Z1 = 56 \
Z2 = 56 \
${k2b_params}"
# SOAP Descriptors
soap_params="l_max = 4 \
n_max = 4 \
cutoff = 4 \
cutoff_transition_width = 0.5 \
atom_sigma = 0.5 \
n_sparse = 300 \
zeta = 4 \
delta = 0.2 \
covariance_type = dot_product \
n_species = 4 \
species_Z = {8 29 39 56} \
sparse_method = cur_points"
# R_mix=T Z_mix=T K=300 sym_mix=T coupling=F" #compression
soap_O="soap add_species = F \
Z = 8 \
${soap_params}"
soap_Cu="soap add_species = F \
Z = 29 \
${soap_params}"
soap_Y="soap add_species = F \
Z = 39 \
${soap_params}"
soap_Ba="soap add_species = F \
Z = 56 \
${soap_params}"
# Run the Program
exec="/work/e05/e05/ash141/codes/QUIP2/QUIP/build/archer2_mpich+openmp/gap_fit"
srun $exec atoms_filename=$infile gap={{$k2b_Cu_Cu}:{$k2b_Cu_O}:{$k2b_Cu_Y}:{$k2b_Cu_Ba}:{$k2b_O_O}:{$k2b_O_Ba}:{$k2b_O_Y}:{$k2b_Y_Y}:{$k2b_Y_Ba}:{$k2b_Ba_Ba}:{$soap_O}:{$soap_Cu}:{$soap_Y}:{$soap_Y}} $settings gp_file=$outfile
Thank you in advance for any help!