Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quickbuild tests #593

Merged
merged 29 commits into from
Dec 13, 2023
Merged

Quickbuild tests #593

merged 29 commits into from
Dec 13, 2023

Conversation

hkershaw-brown
Copy link
Member

@hkershaw-brown hkershaw-brown commented Dec 11, 2023

Description:

building on Ann's previous pull request #575
improved build_everything, submits a job on Derecho for each compiler, each job runs every quickbuild.sh

time:
ccc ~12minutes
gfortran ~2:30 minutes
nvhpc ~4:40 minutes
ifort ~4:40 minutes

Note several of the converters require external libraries or code to be added before compiling, ignoring these for this pull request (rttov, hdfeos, wrf code, ncep prepbuf code):

intel RESULT: 1 /glade/derecho/scratch/hkershaw/build_everything/intel/DART/developer_tests/forward_operators/work/ FAILED
intel RESULT: 10 /glade/derecho/scratch/hkershaw/build_everything/intel/DART/observations/obs_converters/GOES/work/ FAILED
intel RESULT: 11 /glade/derecho/scratch/hkershaw/build_everything/intel/DART/observations/obs_converters/gps/work/ FAILED
intel RESULT: 12 /glade/derecho/scratch/hkershaw/build_everything/intel/DART/observations/obs_converters/NSIDC/work/ FAILED
intel RESULT: 28 /glade/derecho/scratch/hkershaw/build_everything/intel/DART/observations/obs_converters/var/work/ FAILED
intel RESULT: 29 /glade/derecho/scratch/hkershaw/build_everything/intel/DART/observations/obs_converters/AIRS/work/ FAILED
intel RESULT: 31 /glade/derecho/scratch/hkershaw/build_everything/intel/DART/observations/obs_converters/quikscat/work/ FAILED
intel RESULT: 42 /glade/derecho/scratch/hkershaw/build_everything/intel/DART/observations/obs_converters/GMI/work/ FAILED

Other failures: #592 #352, #594

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Documentation changes needed?

  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.

Tests

Derecho: build everything (that has a quickbuild.sh) in DART for cce, intel, gcc, nvhpc

Checklist for merging

  • Updated changelog entry
  • Documentation updated
  • Update conf.py

Checklist for release

  • Merge into main
  • Create release from the main branch with appropriate tag
  • Delete feature-branch

Testing Datasets

  • Dataset needed for testing available upon request
  • Dataset download instructions included
  • No dataset needed

Copy link
Contributor

@mjs2369 mjs2369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to add ifx to the list of compilers to build with, we need to update fixsystem to include ifx

developer_tests/build_everything/README Show resolved Hide resolved
@mjs2369
Copy link
Contributor

mjs2369 commented Dec 11, 2023

The builds that use specific libraries such as rttov will be addressed in a future pull request

@hkershaw-brown
Copy link
Member Author

In order to add ifx to the list of compilers to build with, we need to update fixsystem to include ifx

will do, thanks Marlee!

mjs2369 and others added 2 commits December 11, 2023 14:13
ifx mkmf.template
@hkershaw-brown
Copy link
Member Author

I've added ifx. To run, you'll need to run with the quickbuild_tests branch because main does not have the mkmf.template.ifx.linux

./submit_jobs quickbuild_tests

@mjs2369
Copy link
Contributor

mjs2369 commented Dec 11, 2023

To run, you'll need to run with the quickbuild_tests branch because main does not have the mkmf.template.ifx.linux

./submit_jobs quickbuild_tests

^^^ This should be added to the README

And a follow up question @hkershaw-brown - if the submit_jobs.sh script submits a job for each compiler, are we expecting the ifx build to still run and just error out on all branches other than quickbuild_tests?

Copy link
Contributor

@mjs2369 mjs2369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some code for teardown? Currently, consecutive runs will fail with
Directory exists: /glade/derecho/scratch/masmith/build_everything/nvhpc

@hkershaw-brown
Copy link
Member Author

hkershaw-brown commented Dec 12, 2023

To run, you'll need to run with the quickbuild_tests branch because main does not have the mkmf.template.ifx.linux

./submit_jobs quickbuild_tests

^^^ This should be added to the README

Nope don't add to the README. When this pull request is merged into main, mkmf.template.ifx.linux will exist on main. This 'To run' note was just for you for this pull request.

And a follow up question @hkershaw-brown - if the submit_jobs.sh script submits a job for each compiler, are we expecting the ifx build to still run and just error out on all branches other than quickbuild_tests?

Yes because mkmf.template.ifx.linux only exists on the quickbuild_tests branch.
Once quickbuild_tests is merged into main mkmf.template.ifx.linux will exist on main.

@hkershaw-brown
Copy link
Member Author

Should we add some code for teardown? Currently, consecutive runs will fail with Directory exists: /glade/derecho/scratch/masmith/build_everything/nvhpc

This is the teardown:

mv $test_dir $test_dir.$(date +"%FT%H%M")

If you don't have a compiler.dateTime directory then the job did not finish.
So I think at that point it is worth manually investigating why the job has failed, rather than rm -rf directories.

@mjs2369
Copy link
Contributor

mjs2369 commented Dec 12, 2023

To run, you'll need to run with the quickbuild_tests branch because main does not have the mkmf.template.ifx.linux

./submit_jobs quickbuild_tests

^^^ This should be added to the README

Nope don't add to the README. When this pull request is merged into main, mkmf.template.ifx.linux will exist on main. This 'To run' note was just for you for this pull request.

And a follow up question @hkershaw-brown - if the submit_jobs.sh script submits a job for each compiler, are we expecting the ifx build to still run and just error out on all branches other than quickbuild_tests?

Yes because mkmf.template.ifx.linux only exists on the quickbuild_tests branch. Once quickbuild_tests is merged into main mkmf.template.ifx.linux will exist on main.

That makes much more sense and is obvious in hindsight. For some reason, I thought you meant that mkmf.template.ifx.linux was not going to be added to main with this PR. Ignore this.

@mjs2369
Copy link
Contributor

mjs2369 commented Dec 12, 2023

Sometimes Derecho is unable to make a connection to the DART remote repository, causing some of the jobs to fail with this message:

Cloning into 'DART'...
fatal: unable to access 'https://github.com/NCAR/DART.git/': Failed to connect to github.com port 443 after 1 ms: Couldn't connect to server
./run_all_quickbuilds.sh: line 51: cd: DART: No such file or directory
fatal: not a git repository (or any parent up to mount point /glade/derecho)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: not a git repository (or any parent up to mount point /glade/derecho)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
unknown branch

I have submitted a support request with CISL Help to address this issue.

@mjs2369
Copy link
Contributor

mjs2369 commented Dec 12, 2023

Additional failures with cce:

Just making a note of these on this PR, as other failures are noted in the body, but we can create issues for these as well

masmith@derecho1:~/DART/developer_tests/build_everything> grep -a FAILED test-2/build-everything-cce.o2646766 
cce RESULT: 0 /glade/derecho/scratch/masmith/build_everything/cce/DART/models/clm/work/ FAILED
cce RESULT: 3 /glade/derecho/scratch/masmith/build_everything/cce/DART/models/noah/work/ FAILED
cce RESULT: 6 /glade/derecho/scratch/masmith/build_everything/cce/DART/models/wrf_hydro/work/ FAILED
cce RESULT: 12 /glade/derecho/scratch/masmith/build_everything/cce/DART/models/wrf/work/ FAILED

CLM:

ftn -O2  -I/glade/u/apps/derecho/23.06/spack/opt/spack/netcdf/4.9.2/cce/15.0.1/cuko/include  -c	/glade/u/home/masmith/DART/models/clm/dart_to_clm.f90


ftn-1569 ftn: WARNING UPDATE_SNOW, File = ../../../../../../u/home/masmith/DART/models/clm/dart_to_clm.f90, Line = 612, Column = 19 
  A DO loop variable or expression of type default real or double precision real is a deleted feature of the Fortran standard.


ftn-1569 ftn: WARNING UPDATE_SNOW, File = ../../../../../../u/home/masmith/DART/models/clm/dart_to_clm.f90, Line = 628, Column = 19 
  A DO loop variable or expression of type default real or double precision real is a deleted feature of the Fortran standard.


ftn-319 ftn: ERROR UPDATE_SNOW, File = ../../../../../../u/home/masmith/DART/models/clm/dart_to_clm.f90, Line = 752, Column = 76 
  A subscript must be a scalar integer expression.

Cray Fortran : Version 15.0.1 (20230120205242_66f7391d6a03cf932f321b9f6b1d8612ef5f362c)

Line in question -

snowdp_po(icolumn) = snowdp_pr(icolumn) + sum(gain_dzsno(nlevsno+1+snlsno(icolumn):nlevsno,icolumn))

snlsno(ncolumn), which is the subscript for gain_dzsno, is a real(r8) - changed to integer (line 418) and it compiles

real(r8) :: clm_SNLSNO(ncolumn)

NOAH/WRF_HYDRO:


ftn -O2  -I/glade/u/apps/derecho/23.06/spack/opt/spack/netcdf/4.9.2/cce/15.0.1/cuko/include  -c	/glade/u/home/masmith/DART/models/wrf_hydro/noah_hydro_mod.f90
   Error message      ::  _expr_type: Invalid table type
   Error detected     ::  File '/home/jenkins/crayftn/pdgcs/v_expr_utl.c', line 7360
   Initiated from     ::  Line 1280 (v_main.c)
   Optimizer built    ::  2023-01-20 (production)

   File               ::  /glade/u/home/masmith/DART/models/wrf_hydro/noah_hydro_mod.f90
   Function           ::  getchannelgridcoords
   at or near line    ::  660

   Compiler hash      ::  66f7391d6a03cf932f321b9f6b1d8612ef5f362c
   Target             ::  x86-milan

ftn-7991 ftn: INTERNAL GETCHANNELGRIDCOORDS, File = ../../../../../../u/home/masmith/DART/models/wrf_hydro/noah_hydro_mod.f90, Line = 660 
  INTERNAL COMPILER ERROR:  "_expr_type: Invalid table type" (/home/jenkins/crayftn/pdgcs/v_expr_utl.c, line 7360, version 66f7391d6a03cf932f321b9f6b1d8612ef5f362c)

Line in question:

n_link = sum(CH_NETRT*0+1, mask = CH_NETRT >= 0)

WRF:

Building  WRF_DART_utilities/add_pert_where_high_refl  build  18  of  29
............................................................................................... Makefile is ready.
ftn -O2  -I/glade/u/apps/derecho/23.06/spack/opt/spack/netcdf/4.9.2/cce/15.0.1/cuko/include  -c	/glade/u/home/masmith/DART/models/wrf/WRF_DART_utilities/add_pert_where_high_refl.f90


ftn-292 ftn: ERROR ADD_PERT_WHERE_HIGH_REFL, File = ../../../../../../u/home/masmith/DART/models/wrf/WRF_DART_utilities/add_pert_where_high_refl.f90, Line = 37, Column = 8 
  "F2KCLI" is specified as the module name on a USE statement, but the compiler cannot find it.

Cray Fortran : Version 15.0.1 (20230120205242_66f7391d6a03cf932f321b9f6b1d8612ef5f362c)

Solution is to remove the following line:
use f2kcli


Compiles after this change

@hkershaw-brown
Copy link
Member Author

@mjs2369
#599
#598

Copy link
Contributor

@mjs2369 mjs2369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge. The quickbuild failures are being accurately reported

@hkershaw-brown hkershaw-brown added the release! bundle with next release label Dec 12, 2023
hkershaw-brown and others added 4 commits December 13, 2023 10:38
Resolution to Issue #336 - updated nc_check
fix print of number of obs converted, and make input.nml defaults better
mom6 model_mod .eqv. for logicals comparison
@hkershaw-brown hkershaw-brown merged commit 99ebec5 into main Dec 13, 2023
4 checks passed
@hkershaw-brown hkershaw-brown deleted the quickbuild_tests branch December 13, 2023 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release! bundle with next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants