POD discussion: ENSO MSE (Moist Static Energy Budget) #117

tsjackson-noaa · 2020-02-18T18:57:45Z

tsjackson-noaa
Feb 18, 2020

Thread for progress in adding ENSO_MSE diagnostic from Hariharasubramanian Annamalai (U. Hawaii).

Legacy documentation:
http://www.cgd.ucar.edu/cms/bundy/Projects/diagnostics/mdtf/mdtf_figures/MDTF_CCSM4/MSE_diag/MSE_diag.html

tsjackson-noaa · 2020-08-14T18:05:04Z

tsjackson-noaa
Aug 14, 2020
Author

Progress update:

Link to development branch and POD commit history. These links will stop working when the branch is deleted after integration work is finished. Links to files below will remain valid.

Current bugs/errors:

Observational data does not follow the v3.0 digested observational data policy. Current amount of observational data (2.64 Gb) is above recommended limit ( < 1 Gb). Observational data is processed identically every time the POD is run.
Assumptions on dimensions of model variables:
1. Variable units set based on name in COMPOSITE/NCL_CONVERT/data_routine.ncl
2. Precipitation flux units assumed in MSE_VAR/moist_routine_variance.py
3. Code base needs to be reviewed in order to locate all instances of this.
The test for sentinel values in MSE_VAR/moist_routine_variance.py is flawed and leads to overflow errors when missing data are involved (eg, if the analysis range does not include an El Nino or La Nina event). Plots of model data for Stage 3 are not produced when this happens.
The "COMPOSITE/obs/regression_PR.png" plot is incorrect -- either the color scale is set incorrectly or the data is bad. Plot is generated by COMPOSITE/NCL/plot_regression_all_OBS.ncl. Regression plots of other variables look OK.
Scatterplots in stage 4 do not include data for the model being analyzed. No model data (either input data, or intermediate data generated by previous stages) is loaded by the code in this section.

Remaining necessary tasks:

Fix bugs listed above
Refactor code to be compliant with the digested observational data policy. Designate current *_OBS routines as the observational data reduction code base.
Implement more robust handling of the case where the analysis range does not include an El Nino or La Nina event; ie report this information to the user.

Edit:
Converted bulleted list to numbered list.

0 replies

tsjackson-noaa · 2020-08-26T06:14:00Z

tsjackson-noaa
Aug 26, 2020
Author

More detail on the overflow bug (number 3) in #19 (comment) .

Here's the relevant section of the logs:

Log details

 ===================================================================
        Observational MSE Variances Finished     2020-08-09 23:54
 ===================================================================
/local2/home/MDTF/MDTF-diagnostics/diagnostics/ENSO_MSE/shared/util.pyc:check_required_dirs:  starting
	 looking for required dir: /net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/MSE_VAR/model/PS
/net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/MSE_VAR/model/PS = /net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/MSE_VAR/model/PS created
	 looking for required dir: /net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/MSE_VAR/model/netCDF/ELNINO
/net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/MSE_VAR/model/netCDF/ELNINO = /net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/MSE_VAR/model/netCDF/ELNINO created
	 looking for required dir: /net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/MSE_VAR/model/netCDF/LANINA
/net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/MSE_VAR/model/netCDF/LANINA = /net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/MSE_VAR/model/netCDF/LANINA created
/local2/home/MDTF/MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/moist_routine_variance.py:80: RuntimeWarning: overflow encountered in float_scalars
  cc = cc + ts[i,j]*ts[i,j]
/local2/home/MDTF/MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/moist_routine_variance.py:102: RuntimeWarning: overflow encountered in float_scalars
  cc = cc + shf[i,j]*mse[i,j]*factor
/local2/home/MDTF/MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/moist_routine_variance.py:113: RuntimeWarning: overflow encountered in float_scalars
  cc = cc + lhf[i,j]*mse[i,j]*factor
/local2/home/MDTF/MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/moist_routine_variance.py:124: RuntimeWarning: overflow encountered in float_scalars
  cc = cc + sw[i,j]*mse[i,j]*factor
/local2/home/MDTF/MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/moist_routine_variance.py:135: RuntimeWarning: overflow encountered in float_scalars
  cc = cc + lw[i,j]*mse[i,j]*factor
/local2/home/MDTF/MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/moist_routine_variance.py:158: RuntimeWarning: overflow encountered in float_scalars
  cc = cc + madv[i,j]*mse[i,j]*factor
/local2/home/MDTF/MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/moist_routine_variance.py:170: RuntimeWarning: overflow encountered in float_scalars
  cc = cc + tadv[i,j]*mse[i,j]*factor
/local2/home/MDTF/MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/moist_routine_variance.py:183: RuntimeWarning: overflow encountered in float_scalars
  cc = cc + omse[i,j]*mse[i,j]*factor
  The NetCDF data have already been converted  
   
 
   Seasonal ENSO MSE Variance composites started  2020-08-09 23:54
('DRBDBG prefix ', '/net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/COMPOSITE/model//netCDF/ELNINO/')
('DRBDBG prefix ', '/net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/COMPOSITE/model//netCDF/LANINA')
NCL routine /local2/home/MDTF/MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/NCL/plot_bars_composite.ncl:
	
NCL routine /local2/home/MDTF/MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/NCL_general/plot_bars_composite.ncl:
	
   Seasonal ENSO MSE Variance composites finished  2020-08-09 23:54
   resulting plots are located in : /net2/Thomas.Jackson/tmp/wkdir/MDTF_ESM4_historical_D1_2000_2004.v24/ENSO_MSE/MSE_VAR/model
 
 ===================================================================
         MSE Variances Finished     2020-08-09 23:54
 ===================================================================

The overflow is due to the use of sentinel values (called undef here) for missing data, and the incorrect implementation of the tests in each of the loops in moisture_variance(). For simplicity we describe the first one:

MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/moist_routine_variance.py

Lines 74 to 84 in ef913d2

    
           ##  ts  variance !!  rest  CO-variances  except mse !! 
        
               ss = 0. 
        
               cc = 0.  
        
               for j in range(jj1, jj2): 
        
                   for i in range (ii1, ii2): 
        
                       if(ts[i,j] < undef):   
        
                           cc = cc + ts[i,j]*ts[i,j]    
        
                           ss = ss + 1. 
        
           ##    endif enddo 
        
               if( ss > 0.): 
        
                   ts_var = cc/ss

since the same remarks apply to all of them.

The data causing the overflow (ts[i,j] above) is read in as 32-bit floats by MSE_VAR/get_flux_in. The sentinel value undef is defined as a python float literal:

MDTF-diagnostics/diagnostics/ENSO_MSE/MSE_VAR/MSE_VAR.py

Lines 153 to 157 in 33e9eaf

    
           undef = float(99999999999.) 
        
           undef2 = float(1.1e+20) 
        
           undef = 1.1E+20 
        
           undef2 = -999999999.

Python floats are 64-bit (technically, the length of a double set in the float.h on the system where the interpreter was compiled). When evaluating ts[i,j] < undef, ts[i,j] is promoted to 64-bits. Even if ts[i,j] was set to the sentinel value, the comparison will never evaluate to False due to roundoff error, as the decimal value chosen for undef isn't exactly representable in binary. The entries of ts that were set equal to the sentinel value are not excluded from the sum, which overflows.

This can be confirmed via debugging statements added in the following commit: 33e9eaf . When missing data is encountered (eg, by running the POD on a date range without both El Nino and La Nina years), the first debug statement will be printed many times, while the second will never be printed.

Recommended fix is to use NumPy best practices as listed on the MDTF documentation, in particular the use of NumPy MaskedArrays to handle missing or invalid data instead of comparisons to sentinel values. Furthermore, the covariance computations done by moisture_variance can be done quicker, using compiled code, by calling numpy.cov along with numpy.flatten, etc. rather than re-implementing these functions in the POD.

0 replies

jhafner2 · 2021-01-21T02:39:28Z

jhafner2
Jan 21, 2021

Current bugs/errors:

Observational data does not follow the v3.0 digested observational data policy. Current amount of observational data (2.64 Gb) is above recommended limit (< 1 Gb). Observational data is processed identically every time the POD is run.
Response:

(i)We have removed the raw ERA-interim monthly data and that has reduced the size significantly.

(ii) We have fixed the pre-digested observations (in our ERA-interim) results as follows: (a) COMPOSITES; (b) CLIMA: (c) MSE_terms; (d) MSE_var_terms

Now there is NO need to process the observations every time the POD is run.

We have kept the pre-processed results as before in the respective subdirectories. The script will just re-produce the plots

2 Assumptions on dimensions of model variables:

Variable units set based on name in COMPOSITE/NCL_CONVERT/data_routine.ncl
Precipitation flux units assumed in MSE_VAR/moist_routine_variance.py
Code base needs to be reviewed in order to locate all instances of this.

Response: Checks for variable units – with xarray – have been set up.

The test for sentinel values in MSE_VAR/moist_routine_variance.py is flawed and leads to overflow errors when missing data are involved (eg, if the analysis range does not include an El Nino or La Nina event). Plots of model data for Stage 3 are not produced when this happens.

Response: Checks are implemented to inform users (interactively):
(i) If no El Nino/La Nina events detected the POD exits with an error message.
(ii) If only one event is identified, the user is alerted and asked to continue or exit the POD
(iii) Total number of El Nino / La Nina events identified are informed to the user.

Handling missing data with the data mask has been implemented, all missing data are masked now.

The "COMPOSITE/obs/regression_PR.png” plot is incorrect -- either the color scale is set incorrectly or the data is bad. Plot is generated by COMPOSITE/NCL/plot_regression_all_OBS.ncl. Regression plots of other variables look OK.

Response: Fixed

5 Scatterplots in stage 4 do not include data for the model being analyzed. No model data (either input data, or intermediate data generated by previous stages) is loaded by the code in this section.

Response: We have fixed this. Results from the “model being analyzed” will be included with the pre-digested values to make the SCATTER.

Remaining necessary tasks:

Fixed the bugs listed above:

Response: All the bugs listed above are fixed.

Refactor code to be compliant with the digested observational data policy. Designate current *_OBS routines as the observational data reduction code base.
Response: issue resolved, pre-digested data are provided to fit the limits.

Response: Regarding observational data policy size - This has been taken care of (see our response to issue # 1)

Implement more robust handling of the case where the analysis range does not include an El Nino or La Nina event; report this information to the user.

Response: Done (see above)

More detail on the overflow bug (number 3) in #19 (comment) .

(i) The problem is with use of if(ts[i,j] < undef): which in python is not handled the same way as in FORTRAN. Tom suggests to use “mask” on arrays instead of if statements : if( ts < undef):
Response: fixed – related to the item # 3 as given above. All missing data are masked.

(ii) Furthermore, the covariance computations done by moisture_variance can be done quicker, using compiled code, by calling numpy.cov along with numpy.flatten, etc. rather than re-implementing these functions in the POD.

Response: Implemented as recommended.

B: Tom’s comments - his email Friday 28th August

I have the following recommendations for improving the runtime (in descending order of priority):

Provide "predigested" observational data, instead of re-running COMPOSITE_OBS every time the diagnostic is run.
The observational data are pre-digested and reside in :
~/diagnostics/MDTF_v2.1.a_20130319/COMPOSITE/obs/netCDF/DATA
~/diagnostics/MDTF_v2.1.a_20130319/COMPOSITE/obs/netCDF/CLIMA

Response: Done.

Use numpy best practices. This is the topic of the rest of this email.
Those were covered in his previous comments as given in part A.

Response: Thanks

Increase the chunk size of temp files. Monthly .grd files are at most 3.6Mb each, so working with annual instead of monthly chunks will reduce file I/O without hitting any noticeable memory ceiling.

Response: The monthly files were merged into annual files as suggested for model data. We will no longer provide raw monthly data for ERA-interim.

Additional issues we encountered. Took more time to identify the source of errors.

When running the POD in the framework using ./mdtf -f src/default_tests.jsonc
without issuing any error messages, the program “hanged”.

Response/Fix: This needed to debug “segment by segment” to find out exact line which gives the trouble. It turned out that the framework script is searching through all directories under ~/diagnostics, even if only one POD is requested (e.g. ENSO_MSE). It is searching for specific files, namely: settings.jsonc. In subdirectory ~/diagnostics/example this file was downloaded from the MDTF github depository, along with example_diag.py. I have downloaded those files incorrectly, and they came out as html files rather than ASCII. This discrepancy in format of settings.jsonc file caused the framework script to “hang up” with no error message output. Just removing the whole directory ~/diagnostics/example solved the problem.

When adding new python modules to the source code, they were not loaded during the POD run.

Response/Fix: When I updated python code with new python modules (e.g. xarrays, netcdf4), those were not loaded during the run. When testing sample python code outside the framework, all modules loaded just fine. I have checked with conda if those modules are loaded under _MDTF_python3_base, which they were. All installed via: conda install -c conda-forge new_package. Also the new modules were updated in the ~/src/conda/env_ENSO_MSE.yml file. When checking system output messages, it showed that new modules are not loaded under _MDTF_ENSO_MSE environment, that is the environment ENSO_MSE POD is supposed to run. The solution was to re-run conda set up as follows:

% cd $CODE_ROOT
% ./src/conda/conda_env_setup.sh --all --conda_root $CONDA_ROOT –env_dir $CONDA_ENV_DIR

0 replies

tsjackson-noaa · 2021-01-23T20:56:54Z

tsjackson-noaa
Jan 23, 2021
Author

The branch containing existing development work on this POD has been renamed from pod/ENSO_MSE to feature/add_ENSO_MSE, as part of the reorganization proposed in issue #106.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POD discussion: ENSO MSE (Moist Static Energy Budget) #117

{{title}}

Replies: 4 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

POD discussion: ENSO MSE (Moist Static Energy Budget) #117

tsjackson-noaa Feb 18, 2020

Replies: 4 comments

tsjackson-noaa Aug 14, 2020 Author

tsjackson-noaa Aug 26, 2020 Author

jhafner2 Jan 21, 2021

tsjackson-noaa Jan 23, 2021 Author

tsjackson-noaa
Feb 18, 2020

tsjackson-noaa
Aug 14, 2020
Author

tsjackson-noaa
Aug 26, 2020
Author

jhafner2
Jan 21, 2021

tsjackson-noaa
Jan 23, 2021
Author