-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Description
Hello,
I'd like to report two errors that I observed when running the MPI + Kokkos version of MiniMD (miniMD/kokkos).
The first error is the T and P values showing up as NaN, which causes some kernels to run abnormally fast.
The configuration is as the following, executed on 32 nodes of OLCF Summit:
$ jsrun -n192 -a1 -c1 -g1 -K3 -r6 -M -gpu ./miniMD -i in.lj.miniMD -gn 0 -nx 768 -ny 768 -nz 384 -n 100
# Create System:
# Done ....
# miniMD-Reference 1.2 (MPI+OpenMP) output ...
# Run Settings:
# MPI processes: 192
# Host Threads: 1
# Inputfile: ../inputs/in.lj.miniMD
# Datafile: None
# Physics Settings:
# ForceStyle: LJ
# Force Parameters: 1.00 1.00
# Units: LJ
# Atoms: 905969664
# Atom types: 8
# System size: 1289.93 1289.93 644.96 (unit cells: 768 768 384)
# Density: 0.844200
# Force cutoff: 2.500000
# Timestep size: 0.005000
# Technical Settings:
# Neigh cutoff: 2.800000
# Half neighborlists: 1
# Team neighborlist construction: 0
# Neighbor bins: 460 460 230
# Neighbor frequency: 1000
# Sorting frequency: 1000
# Thermo frequency: 100
# Ghost Newton: 0
# Use intrinsics: 0
# Do safe exchange: 0
# Size of float: 8
# Starting dynamics ...
# Timestep T U P Time
0 nan -6.773368e+00 nan 0.000
100 nan 0.000000e+00 nan 1.138
# Performance Summary:
# MPI_proc OMP_threads nsteps natoms t_total t_force t_neigh t_comm t_other performance perf/thread grep_string t_extra
192 1 100 905969664 1.137955 0.050640 0.000000 0.671161 0.416153 79613833194.819092 414655381.223016 PERF_SUMMARY 0.000000
The second error is an integer overflow error in the total number of atoms, with large problem sizes:
$ jsrun -n1536 -a1 -c1 -g1 -K3 -r6 -M -gpu ./miniMD -i in.lj.miniMD -gn 0 -nx 1536 -ny 1536 -nz 768 -n 100
# Create System:
# Done ....
# miniMD-Reference 1.2 (MPI+OpenMP) output ...
# Run Settings:
# MPI processes: 1536
# Host Threads: 1
# Inputfile: ../inputs/in.lj.miniMD
# Datafile: None
# Physics Settings:
# ForceStyle: LJ
# Force Parameters: 1.00 1.00
# Units: LJ
# Atoms: -1342177280
# Atom types: 8
# System size: 2579.86 2579.86 1289.93 (unit cells: 1536 1536 768)
# Density: 0.844200
# Force cutoff: 2.500000
# Timestep size: 0.005000
# Technical Settings:
# Neigh cutoff: 2.800000
# Half neighborlists: 1
# Team neighborlist construction: 0
# Neighbor bins: 921 921 460
# Neighbor frequency: 1000
# Sorting frequency: 1000
# Thermo frequency: 100
# Ghost Newton: 0
# Use intrinsics: 0
# Do safe exchange: 0
# Size of float: 8
# Starting dynamics ...
# Timestep T U P Time
0 1.440000e+00 3.657619e+01 -6.220309e+00 0.000
100 1.435069e+00 3.657569e+01 -6.219723e+00 2.041
# Performance Summary:
# MPI_proc OMP_threads nsteps natoms t_total t_force t_neigh t_comm t_other performance perf/thread grep_string t_extra
1536 1 100 -1342177280 2.040788 0.056916 0.000000 0.852597 1.131275 -65767589680.726250 -42817441.198389 PERF_SUMMARY 0.000000
Metadata
Metadata
Assignees
Labels
No labels