Skip to content

Commit 3fd68c9

Browse files
committed
Finally, a working implementation of particle near surface in FFT mode.
- additional required memory was described in comments in calculator.c - definition of ZsumShift is now different for sparse and FFT modes in parallel (relative to the local or global bottom dipoles respectively). Related difference is in definition of local_Nz_Rm (used for Sommerfeld table and, in FFT mode, for filling Rmatrix). - Sommerfeld table is now not calculated during prognosis, but corresponding memory is properly accounted for. - Current implementation requires additional forward fftY and fftZ at each iteration (but see issue 177). - BlockTranspose_Dm in comm.c was renamed into BlockTranspose_DRm, since it works both for D- and Rmatrix. - existing functions to index Dmatrix has been slightly changed to not use DsizeYZ and use y>=DsizeY instead of y>smallY. - functions and plans, not specific to Dmatrix were renamed (Y and Z forward transforms), now they have 'slice' in their names. - transposeYZ_Dm was changed to transpose (a general function), and transposeYZ is now just a wrapper around the latter. - version incremented to 1.3b1. Main computational bottleneck is computation of the table of Sommerfeld integrals. Currently it is approximately equivalent to 200 iterations (issue 176). So should not be a problem for applications with a larger number of iterations. New implementation was extensively tested against previous sparse version. Still, quantitative comparison with published literature data is required. Remaining limitation is that it doesn't work in OpenCL mode (see issue 101 for details, explicit exception was added to param.c). Changes to timing: - added time for 'init interaction' (significant when Sommerfeld table is calculated). - Time for 'init Dmatrix' now may include the time for initialization of Rmatrix (doesn't include time for Sommerfeld table). - Precise timing for Dmatrix now includes a separate line for initialization of Rmatrix (without details) Other: - scattering at exactly 90 degrees (along surface) for non-trivial surfaces is now handled by a special case to produce exact 0. This makes the output consistent, avoiding large integration errors for -phi_integr or alldir. - added explicit exception to forbid combination of -no_reduced_fft and -iter cgnr in MPI FFT mode (issue 174). Tests in tests/2exec/ have been significantly improved - added SURF_EXT flag, which allows running a large suite of tests, adding '-surf ...' option to each of them. - added a number of ignores, which are always active (decreases false-positives) - added a couple of macros in suite files (NOMPI and NOMPISEQ), which indicates that the line should be skipped for a specific comparison modes. - added a number of tests to the default suite: '-surf ...', '-int_surf', -scat_plane, '-shape read ... -grid ...', '-no_reduced_fft -iter cgnr' (see above)
1 parent c870ace commit 3fd68c9

File tree

19 files changed

+606
-249
lines changed

19 files changed

+606
-249
lines changed

src/GenerateB.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ extern const int beam_Npars;
4343
extern const double beam_pars[];
4444
extern const char *beam_fnameY;
4545
extern const char *beam_fnameX;
46-
extern opt_index opt_beam;
46+
extern const opt_index opt_beam;
4747

4848
// used in crosssec.c
4949
double beam_center_0[3]; // position of the beam center in laboratory reference frame

src/calculator.c

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ extern const angle_set beta_int,gamma_int,theta_int,phi_int;
4343
extern const int avg_inc_pol;
4444
extern const char *alldir_parms,*scat_grid_parms;
4545
// defined and initialized in timing.c
46-
extern TIME_TYPE Timing_Init;
46+
extern TIME_TYPE Timing_Init,Timing_Init_Int;
4747
#ifdef OPENCL
4848
extern TIME_TYPE Timing_OCL_Init;
4949
#endif
@@ -445,14 +445,17 @@ static void AllocateEverything(void)
445445
/* estimate of the memory (only the fastest scaling part):
446446
* MatVec - (288+384nprocs/boxX [+192/nprocs])*Ndip
447447
* more exactly: gridX*gridY*gridZ*(36+48nprocs/boxX [+24/nprocs]) value in [] is only for parallel mode.
448-
* For OpenCL mode all MatVec part is allocated on GPU instead of main (CPU) memory
448+
* For surf additionally: gridX*gridY*gridZ*(48+48nprocs/boxX)
449+
* + for Sommerfeld table: 64*boxZ*(boxX*boxY-(MIN(boxX,boxY))^2)
450+
* For OpenCL mode all MatVec part is allocated on GPU instead of main (CPU) memory
449451
* others - nvoid_Ndip*{271(CGNR,BiCG), 367(CSYM,QMR2), 415(BiCGStab,QMR), or 463(BCGS2)}
450452
* + additional 8*nvoid_Ndip for OpenCL mode and CGNR or Bi-CGSTAB
451453
* PARALLEL: above is total; division over processors of MatVec is uniform, others - according to local_nvoid_Ndip
452454
*
453455
* Sparse mode - each processor needs (265--457, depending on iterative solver)*local_nvoid_Ndip + 60*nvoid_Ndip
454456
* and division is uniform, i.e. local_nvoid_Ndip = nvoid_Ndip/nprocs
455-
* Part of the memory is currently not distributed among processors - see issue 160.
457+
* Sommerfeld table - same as above, but it is not divided among processors.
458+
* Part of the memory is currently not distributed among processors - see issues 160,175.
456459
*/
457460
MAXIMIZE(memPeak,memory);
458461
double memSum=AccumulateMax(memPeak,&memmax);
@@ -592,7 +595,9 @@ void Calculator (void)
592595
else dtheta_deg=dtheta_rad=block_theta=0;
593596
finish_avg=false;
594597
// Do preliminary setup for MatVec
598+
TIME_TYPE startInitInt=GET_TIME();
595599
InitInteraction();
600+
Timing_Init_Int=GET_TIME()-startInitInt;
596601
#ifndef SPARSE
597602
// initialize D matrix (for matrix-vector multiplication)
598603
D("InitDmatrix started");

src/comm.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -676,10 +676,10 @@ void BlockTranspose(doublecomplex * restrict X UOIP,TIME_TYPE *timing UOIP)
676676

677677
//======================================================================================================================
678678

679-
void BlockTranspose_Dm(doublecomplex * restrict X UOIP,const size_t lengthY UOIP,const size_t lengthZ UOIP)
680-
/* do the data-transposition, i.e. exchange, between fftX and fftY&fftZ; specialized for D matrix. It can be updated to
681-
* accept timing argument for generality. But, since this is a specialized function, we keep the timing variable
682-
* hard-wired in the code.
679+
void BlockTranspose_DRm(doublecomplex * restrict X UOIP,const size_t lengthY UOIP,const size_t lengthZ UOIP)
680+
/* do the data-transposition, i.e. exchange, between fftX and fftY&fftZ; specialized for D or R matrix. It can be
681+
* updated to accept timing argument for generality. But, since this is a specialized function, we keep the timing
682+
* variable hard-wired in the code.
683683
*/
684684
{
685685
#ifdef ADDA_MPI

src/comm.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ void ReadField(const char * restrict fname,doublecomplex *restrict field);
4545

4646
#ifndef SPARSE
4747
void BlockTranspose(doublecomplex * restrict X,TIME_TYPE *timing);
48-
void BlockTranspose_Dm(doublecomplex * restrict X,size_t lengthY,size_t lengthZ);
48+
void BlockTranspose_DRm(doublecomplex * restrict X,size_t lengthY,size_t lengthZ);
4949
// used by granule generator
5050
void SetGranulComm(double z0,double z1,double gdZ,int gZ,size_t gXY,size_t buf_size,int *lz0,int *lz1,int sm_gr);
5151
void CollectDomainGranul(unsigned char * restrict dom,size_t gXY,int lz0,int locgZ,TIME_TYPE *timing);

src/const.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
#define __const_h
1919

2020
// version number (string)
21-
#define ADDA_VERSION "1.3a2"
21+
#define ADDA_VERSION "1.3b1"
2222

2323
/* ADDA uses certain C99 extensions, which are widely supported by GNU and Intel compilers. However, they may be not
2424
* completely supported by e.g. Microsoft Visual Studio compiler. Therefore, we check the version of the standard here

src/crosssec.c

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -631,6 +631,11 @@ static void CalcFieldSurf(doublecomplex ebuff[static restrict 3], // where to wr
631631
if (above) { // simple reflection
632632
vCopy(nF,nN);
633633
nN[2]*=-1;
634+
// no scattering at exactly 90 degrees for non-trivial surface (to avoid randomness for this case)
635+
if (fabs(nN[2])<ROUND_ERR && cabs(msub-1)>ROUND_ERR) {
636+
cvInit(ebuff);
637+
return;
638+
}
634639
}
635640
else { // transmission
636641
if (msubInf) { // no transmission for perfectly reflecting substrate => zero result
@@ -738,11 +743,13 @@ static void CalcFieldSurf(doublecomplex ebuff[static restrict 3], // where to wr
738743
kt=NAN; // redundant to remove warnings below
739744
}
740745
else {
741-
if (cabs(msub-1)<ROUND_ERR && fabs(ki)<ROUND_ERR) kt=ki; // special case to avoid randomness due to round-off errors
746+
// special case to avoid randomness due to round-off errors
747+
if (cabs(msub-1)<ROUND_ERR && fabs(ki)<ROUND_ERR) kt=ki;
742748
else kt=cSqrtCut(msub*msub - (nN[0]*nN[0]+nN[1]*nN[1]));
743749
if (above) {
744750
/* here we test only the exact zero, since for other cases (when msub=1, and very small values of ki,kt) the
745-
* above assignment kt=ki guarantees correct results through standard functions
751+
* above assignment kt=ki guarantees correct results through standard functions. Also the case of
752+
* 90deg-scattering (for msub!=1) is taken care of in the beginning of this function.
746753
*/
747754
if (ki==0 && kt==0) cs=cp=0;
748755
else {

0 commit comments

Comments
 (0)