Skip to content

Segmentation fault using mpi GAP_fit #673

@Ash-Dickson

Description

@Ash-Dickson

Hi all,

I've been having issues trying to utilise the mpi version of GAP_fit. I compiled with the latest version of QUIP, as per the instructions provided on github (including the added steps for mpi). When I try to fit a potential, I get a segmentation fault during the calculation of the sparse covariance matrices:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x15288caa7d4f in ???
#1  0x15288caa7cbb in ???
#2  0x15288caa9354 in ???
#3  0x15288caedae6 in ???
#4  0xffffffffffffffff in ???
#0  0x1480ceb0cd4f in ???
#0  0x14ee75f4bd4f in ???
srun: error: nid005254: tasks 2,4,6,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62: Segmentation fault
srun: launch/slurm: _step_signal: Terminating StepId=8461484.0

Further to this, the total system memory doesn't seem to display the memory I would expect. For instance, when using 1 node with 256 GB of memory, the total system memory is 256. However, when running with e.g. 4 nodes, this number remains the same. I compiled on archer2 with the existing architecture file for archer2+openmp+openmpi.

The details of my GAP installation are below:

libAtoms::Hello World: 2025-01-07 17:36:39
libAtoms::Hello World: git version  https://github.com/libAtoms/QUIP.git,v0.9.14-37-g61fbbd7bb-dirty
libAtoms::Hello World: QUIP_ARCH    archer2_mpich+openmp
libAtoms::Hello World: compiled on  Dec 19 2024 at 16:19:54
libAtoms::Hello World: MPI parallelisation with 192 processes
libAtoms::Hello World: OpenMP parallelisation with 2 threads
WARNING: libAtoms::Hello World: environment variable OMP_STACKSIZE not set explicitly. The default value - system and compiler dependent - may be too small for some applications.
libAtoms::Hello World: MPI run with the same seed on each process
libAtoms::Hello World: Random Seed = -712267007
libAtoms::Hello World: global verbosity = 0

My GAP input is as follows (I presume this is correct after the update to allow single run sparsification?):

# Input File
n='1'
infile="database.xyz"
outfile="gp${n}.xml"

# Gap-fit Settings
settings="sparse_jitter         = 1e-8 \
	  default_sigma         = {0.002 0.02 0.02 0.0} \
      e0={Ba:-692.188:O:-430.076:Y:-1038.578:Cu:-1305.806} \
      config_type_sigma={YBCO7:0.002:0.02:0.02:0.0:YBCO6:0.005:0.05:0.05:0.0:Y2O3:0.005:0.05:0.05:0.0:BaO:0.005:0.05:0.05:0.0:Cu2O:0.005:0.05:0.05:0.0:O2:0.005:0.05:0.05:0.0} \
      core_param_file=pairpot.xml \
      core_ip_args={IP Glue} \
      energy_parameter_name=dft_energy \
      force_parameter_name=dft_force "

# Two-body descriptors with gaussian kernel
k2b_params="cutoff          = 5.0 \
	   cutoff_transition_width = 1.0 \
	   delta           = 2.0 \
           n_sparse        = 20 \
           sparse_method   = uniform \
           covariance_type = ARD_SE \
           theta_uniform   = 1.0"

k2b_Cu_Cu="distance_2b add_species       = F \
                       Z1                = 29 \
                       Z2                = 29 \
                       ${k2b_params}"

k2b_Cu_O="distance_2b add_species       = F \
                       Z1                = 29 \
                       Z2                = 8 \
                       ${k2b_params}"

k2b_Cu_Y="distance_2b add_species       = F \
                       Z1                = 29 \
                       Z2                = 39 \
                       ${k2b_params}"

k2b_Cu_Ba="distance_2b add_species       = F \
                       Z1                = 29 \
                       Z2                = 56 \
                       ${k2b_params}"

k2b_O_O="distance_2b add_species       = F \
                       Z1                = 8 \
                       Z2                = 8 \
                       ${k2b_params}"

k2b_O_Ba="distance_2b add_species       = F \
                       Z1                = 8 \
                       Z2                = 56 \
                       ${k2b_params}"

k2b_O_Y="distance_2b add_species       = F \
                        Z1                = 8 \
                        Z2                = 39 \
                        ${k2b_params}"

k2b_Y_Y="distance_2b add_species       = F \
                       Z1                = 39 \
                       Z2                = 39 \
                       ${k2b_params}"

k2b_Y_Ba="distance_2b add_species       = F \
                       Z1                = 39 \
                       Z2                = 56 \
                       ${k2b_params}"
k2b_Ba_Ba="distance_2b add_species       = F \
                       Z1                = 56 \
                       Z2                = 56 \
                       ${k2b_params}"



# SOAP Descriptors
soap_params="l_max                   = 4 \
             n_max                   = 4 \
             cutoff                  = 4 \
             cutoff_transition_width = 0.5 \
             atom_sigma              = 0.5 \
             n_sparse                = 300 \
             zeta                    = 4 \
             delta                   = 0.2 \
             covariance_type         = dot_product \
             n_species               = 4 \
             species_Z               = {8 29 39 56} \
             sparse_method           = cur_points"
	        #  R_mix=T Z_mix=T K=300 sym_mix=T coupling=F" #compression

soap_O="soap add_species       = F \
              Z                 = 8 \
              ${soap_params}"

soap_Cu="soap add_species = F \
              Z           = 29 \
              ${soap_params}"

soap_Y="soap add_species       = F \
              Z                 = 39 \
              ${soap_params}"

soap_Ba="soap add_species       = F \
              Z                 = 56 \
              ${soap_params}"

# Run the Program
exec="/work/e05/e05/ash141/codes/QUIP2/QUIP/build/archer2_mpich+openmp/gap_fit"
srun $exec atoms_filename=$infile gap={{$k2b_Cu_Cu}:{$k2b_Cu_O}:{$k2b_Cu_Y}:{$k2b_Cu_Ba}:{$k2b_O_O}:{$k2b_O_Ba}:{$k2b_O_Y}:{$k2b_Y_Y}:{$k2b_Y_Ba}:{$k2b_Ba_Ba}:{$soap_O}:{$soap_Cu}:{$soap_Y}:{$soap_Y}} $settings gp_file=$outfile

Thank you in advance for any help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions