-
Notifications
You must be signed in to change notification settings - Fork 109
Open
Description
The issue reported in #1535 seems to be back in develop. It appears in the invert_test_mobius_sym
and invert_test_mobius_asym
tests. They exit with something like:
[ RUN ] SchwarzNormal/InvertTest.verify/double_double_pcg_mat_pc_dag_mat_pc_normop_pc_additive_schwarz_cg_half_l2
Computed plaquette is 1.233908e-01 (spatial = 1.223209e-01, temporal = 1.244607e-01)
Solution = mat_pc_dag_mat_pc, Solve = normop_pc, Solver = pcg, Precision = double, Sloppy precision = double
CG: Convergence at 10 iterations, L2 relative residual: iterated = 1.652612e+06 (requested = 1.000000e-01)
CG: Convergence at 10 iterations, L2 relative residual: iterated = 1.398104e+04 (requested = 1.000000e-01)
CG: Convergence at 10 iterations, L2 relative residual: iterated = 2.604118e+05 (requested = 1.000000e-01)
...
CG: Convergence at 10 iterations, L2 relative residual: iterated = 1.014964e+02 (requested = 1.000000e-01)
CG: Convergence at 10 iterations, L2 relative residual: iterated = 9.442425e+04 (requested = 1.000000e-01)
CG: Convergence at 10 iterations, L2 relative residual: iterated = 2.110695e+04 (requested = 1.000000e-01)
ERROR: Solver appears to have diverged with residual nan (rank 0, host plate, solver.cpp:417 in bool quda::Solver::convergence(quda::cvector<double>&, quda::cvector<double>&, quda::cvector<double>&, quda::cvector<double>&)())
last kernel called was (name=N4quda4blas11axpyCGNorm2IddEE,volume=1x4x6x8x4,aux=GPU-offline,large_kernel_arg,vol=768,parity=1,precision=8,Ns=4,Nc=3,order=0,N=2,n_rhs=1)
last tune param used was block=(32,1,1), grid=(24,1,1), shared_bytes=0, shared_carve_out=0, aux=(-1,-1,-1,-1)
Saving 294 sets of cached parameters to /home/josborn/lqcd/build/quda-git/qudatune/tunecache_notune_error.tsv
Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
The test given in #1535 fails with a different error:
Computed plaquette is 1.231117e-01 (spatial = 1.236920e-01, temporal = 1.225315e-01)
Solution = mat, Solve = normop_pc, Solver = pcg, Precision = single, Sloppy precision = half
ERROR: Solver appears to have diverged for n = 0 (rank 0, host plate, solver.cpp:479 in void quda::Solver::PrintStats(const char*, int, quda::cvector<double>&, quda::cvector<double>&, quda::cvector<double>&)())
last kernel called was (name=N4quda4blas9axpyZpbx_IfEE,volume=6x12x12x16x8,aux=GPU-offline,large_kernel_arg,vol=110592,parity=1,precision=2,Ns=4,Nc=3,order=0,N=8,n_rhs=1)
last tune param used was block=(640,1,1), grid=(76,1,1), shared_bytes=0, shared_carve_out=0, aux=(-1,-1,-1,-1)
Metadata
Metadata
Assignees
Labels
No labels