CICE6 error when trying to set up regional Arctic configuration #2657

kristinbarton · 2025-03-20T21:11:08Z

kristinbarton
Mar 20, 2025

I am working on setting up a regional Arctic configuration with FV3+MOM6+CICE6. I currently have the regional ocean+atmosphere working, but I am running into an error when attempting to bring CICE6 into the mix.

The error occurs immediately in the first timestep and is due to Picard nonconvergence in the ice thermodynamics calculations. The output error message shows a number of NaNs and 0s in different ice model variables. I suspect there is a problem with the fields being passed into CICE6 by the coupler, but I am not sure how to diagnose the specific source of the issue. Is there a built-in way to view or print out those fields? Any suggestions on where to start digging would be helpful.

Notes on Configuration:

A few of the changes we made to run with CICE6 regionally in the Arctic are below:

ufs.configure: coupling_mode = ufs.frac.aoflux
ufs.configure: added ignoreUnmatchedIndices=true in the run sequence for ICE->MED and OCN->MED steps
ice_in: set grid_type=’regional’
ice_in: timestep is dt=180

Ice thermodynamics settings:

&thermo_nml
    kitd              = 1
    ktherm            = 2
    conduct           = 'MU71'
    a_rapid_mode      =  1.5e-3
    Rac_rapid_mode    =    10.0
    aspect_rapid_mode =     1.0
    dSdt_slow_mode    = -1.5e-7
    phi_c_slow_mode   =    0.05
    phi_i_mushy       =    0.85
/

Full configuration files are attached and can also be viewed from Hera run directory: /scratch2/BMC/gsienkf/Kristin.Barton/stmp/stmp2/Kristin.Barton/FV3_RT/sample_cice_run
ice_in
input.nml
model_configure
MOM_input
ufs.configure

Error Message:

360:  (abort_ice)ABORTED:
360:  (abort_ice) called from ice_step_mod.F90
360:  (abort_ice) line number          599
360:  (abort_ice) error = (step_therm1)
360:
360:  (abort_ice)ABORTED:
360:  (abort_ice) called from ice_step_mod.F90
360:  (abort_ice) line number          599
360:  (abort_ice) error = (step_therm1)
362:   -------------------------------------
362:
362:   (picard_nonconvergence):picard convergence failed!
362:   ==========================
362:
362:   Surface: Tsf0, Tsf
362:             0  -20.1500000000000                          NaN
362:
362:   Snow: zTsn0(k), zTsn(k), zqsn0(k), ks(k), Sswabs(k)
362:             1  -20.1500000000000                          NaN  -124223847.000000
362:         0.300000000000000       0.000000000000000E+000
362:
362:   Ice: zTin0(k), zTin(k), zSin0(k), zSin(k), phi(k), zqin0(k), km(k), Iswabs(k),
362:   dSdt(k)
362:             1  -18.8467559035660                          NaN  0.273462414156580
362:                            NaN  1.338837645360694E-003  -342325305.596719
362:   2.29764029865005       0.000000000000000E+000  0.000000000000000E+000
362:             2  -16.2402677106981                          NaN   1.35451421862065
362:                            NaN  7.248503676187897E-003  -335703286.596847
362:   2.28722451227072       0.000000000000000E+000  0.000000000000000E+000
362:             3  -13.6337795178301                          NaN   2.26890493623997
362:                            NaN  1.344887079662468E-002  -328927902.486145
362:   2.27629636522095       0.000000000000000E+000  0.000000000000000E+000
362:             4  -11.0272913249621                          NaN   2.79619699997074
362:                            NaN  1.867692770014905E-002  -322347161.365704
362:   2.26708191492849       0.000000000000000E+000  0.000000000000000E+000
362:             5  -8.42080313209416                          NaN   3.05306463025459
362:                            NaN  2.351516212128827E-002  -315812634.957654
362:   2.25855452676123       0.000000000000000E+000  0.000000000000000E+000
362:             6  -5.81431493922620                          NaN   3.16180379120218
362:                            NaN  3.258801832086730E-002  -307979698.756491
362:   2.24256361770947       0.000000000000000E+000  0.000000000000000E+000
362:             7  -3.20782674635826                          NaN   3.19687848003383
362:                            NaN  5.712474722745960E-002  -295416051.605377
362:   2.19931763301160       0.000000000000000E+000 -3.471576871452201E-006
362:
362:   Ice boundary: q(k)
362:             0  0.000000000000000E+000
362:             1  0.000000000000000E+000
362:             2  0.000000000000000E+000
362:             3  0.000000000000000E+000
362:             4  0.000000000000000E+000
362:             5  0.000000000000000E+000
362:             6  0.000000000000000E+000
362:             7  0.000000000000000E+000
362:
362:   dt:          180.000000000000
362:   hilyr:      4.603622977281613E-002
362:   hslyr:      6.445072168194259E-002
362:   Tbot:       -1.38676962010723
362:   fswint:     0.000000000000000E+000
362:   fswsfc:     0.000000000000000E+000
362:   rhoa:       0.000000000000000E+000
362:   flw:        0.000000000000000E+000
362:   potT:        276.772565757737
362:   Qa:         3.754710080102086E-003
362:   shcoef:                        NaN
362:   lhcoef:     0.000000000000000E+000
362:   qpond:      0.000000000000000E+000
362:   qocn:       -6001478.50831023
362:   Spond:      0.000000000000000E+000
362:   sss:         24.9871444702148
362:   w:          0.000000000000000E+000
362:   flwoutn:                       NaN
362:   fsensn:                        NaN
362:   flatn:                         NaN
362:   fsurfn:                        NaN
362:   fcondtop:                      NaN
362:   fcondbot:                      NaN
362:   fadvheat:                      NaN
362:
362:   -------------------------------------
362:  (picard_solver) picard_solver: Picard solver non-convergence
362:   (icepack_warnings_setabort) T
362:    (icepack_warnings_setabort) T :file icepack_therm_mushy.F90
362:     (icepack_warnings_setabort) T :file icepack_therm_mushy.F90 :line         13
362:  52
362:  (icepack_warnings_aborted) ... (picard_solver)
362:  (icepack_warnings_aborted) ... (two_stage_solver_snow)
362:  (icepack_warnings_aborted) ... (temperature_changes_salinity)
362:   (temperature_changes_salinity)temperature_changes_salinity: Picard solver non-
362:  convergence (snow)
362:  (icepack_warnings_aborted) ... (thermo_vertical)
362:  (icepack_warnings_aborted) ... (icepack_step_therm1)
362:   (icepack_step_therm1) ice: Vertical thermo error, cat            1
362:  (icepack_warnings_aborted) ... (icepack_step_therm1)
362:  (icepack_warnings_aborted) ... (icepack_step_therm1)

Other details:

Platform: Hera
The version of the UFS: develop branch, commit f3ce169
The configuration of the UFS WM being used: We are trying to set up a new configuration, based on a mix of HAFS and S2S.
Current shell (e.g., bash, csh): bash
Run directory: /scratch2/BMC/gsienkf/Kristin.Barton/stmp/stmp2/Kristin.Barton/FV3_RT/sample_cice_run
Compilation details:

COMPILE | arctic_all | intel | -DAPP=S2S -DREGIONAL_MOM6=ON -DCDEPS_INLINE=ON -DMOVING_NEST=OFF -D32BIT=ON  -DCCPP_SUITES=FV3_HAFS_v1_gfdlmp_tedmf,FV3_HAFS_v1_gfdlmp_tedmf_nonsst,FV3_HAFS_v1_thompson,FV3_HAFS_v1_thompson_nonsst,FV3_GFS_v17_coupled_p8_sfcocn -DCMEPS_AOFLUX=ON |   | fv3 |

Current modules:

  1) rocoto/1.3.7                     10) nghttp2/1.57.0  19) sp/2.5.0           28) hdf5/1.14.0             37) udunits/2.2.28
  2) wgrib2/3.1.2_wmo                 11) curl/8.4.0      20) w3emc/2.10.0       29) netcdf-c/4.9.2          38) nco/5.0.6
  3) netcdf/4.7.0                     12) cmake/3.23.1    21) sigio/2.3.2        30) netcdf-fortran/4.6.1    39) build.hera.intel
  4) contrib                          13) bacio/2.4.1     22) zlib/1.2.13        31) nccmp/1.9.0.1           40) libxt/1.1.5
  5) hpss/hpss                        14) libjpeg/2.1.0   23) libpng/1.6.37      32) parallel-netcdf/1.12.2  41) libxmu/1.1.4
  6) intel/2022.1.2                   15) jasper/2.0.32   24) snappy/1.1.10      33) parallelio/2.5.10       42) libxpm/4.11.0
  7) stack-intel/2021.5.0             16) g2/3.4.5        25) zstd/1.5.2         34) esmf/8.6.0              43) libxaw/1.0.13
  8) impi/2022.1.2                    17) ip/4.3.0        26) c-blosc/1.21.5     35) antlr/2.7.7             44) ncview/2.1.9
  9) stack-intel-oneapi-mpi/2021.5.1  18) nemsio/2.5.4    27) pkg-config/0.27.1  36) gsl/2.7.1

DeniseWorthen · 2025-03-20T21:45:44Z

DeniseWorthen
Mar 20, 2025
Maintainer

I suspect one issue is that you have the coupling mode set to ufs.frac.aoflux but with an active atm (FV3). The aoflux coupling mode is used for a DATM configuration; when coupled w/ FV3, the ATM uses the imported fluxes from CICE over the ice covered portion of the cell, but calculates the fluxes over the open water portion. The merged flux is what is sent to OCN.

In general the best way to diagnose coupling issues is to use the mediator history functionality. You should remove the history_n = 3, history_option = nhours and history_ymd = -999 (and remove the med_phases_history_write, which you've currently commented out) and instead use in the CMEPS attributes settings like

  history_n_atm_inst = 1
  history_option_atm_inst = nsteps

where 'atm' can be also be ice or ocn. This will produce history files containing every field to/from the named component every step through the run sequence.

I don't know about your use of the ignoreUnmatchedIndices settings. That is nothing we've ever used in the coupled configurations.

I'd also suggest you do everything in debug mode while you're trying to get things running.

0 replies

kristinbarton · 2025-03-21T15:48:54Z

kristinbarton
Mar 21, 2025
Author

Thank you for the information.

It looks like the ignoreUnmatchedIndices setting was unnecessary. We originally used it to get past an error we had encountered earlier in the mediator setup, but I think the actual source of that error was elsewhere (we had a separate issue where the ocean/land mask used by the ice grid did not match the one used by the ocean). I have removed that and changed the coupling mode to just ufs.frac.

The error does still persist, but I will look through the mediator history and see if that helps narrow things down.

0 replies

DeniseWorthen · 2025-03-21T15:56:02Z

DeniseWorthen
Mar 21, 2025
Maintainer

I would still start w/ a debug compile; secondly you can add to the med attributes the setting

check_for_nans = .true.

That will abort if any NaN appears in the exchanged fields.

0 replies

DeniseWorthen · 2025-03-25T16:38:20Z

DeniseWorthen
Mar 25, 2025
Maintainer

@kristinbarton I took a look at your run directory, and I'm confused about how you're trying to set this up. It seems you're trying to use the mediator fluxes sent to ATM (frac.aoflux) but w/o the aoflux run phase and w/o several needed config variables. If you really want to use this coupling mode, see the cpld_control_noaero_p8_agrid test.

Assuming you actually want ATM to function as it currently does in S2S (ufs.frac, with ATM calculating the A-O fluxes), I compiled your application in debug, set the coupling mode to ufs.frac and got a divide-by-zero in CICE

CICE/cicecore/drivers/nuopc/cmeps/ice_import_export.F90:641

I also added the CMEPS attribute

  write_dststatus = true

See NOAA-EMC/CMEPS#129 for the commit which enabled this feature.

This flag will produce a file (ufs.hafs.dststatus.nc) which shows how each point on the destination was mapped. The documentation is at https://earthsystemmodeling.org/docs/release/latest/ESMF_refdoc/node9.html#SECTION090152000000000000000

The following shows the mapping from ATM->ICE in your case

You've got quite a few regions of status=1, which means no valid ATM value is being provided to ICE. I think this is most likely the cause of the divide-by-zero.

0 replies

kristinbarton · 2025-03-25T16:57:57Z

kristinbarton
Mar 25, 2025
Author

@DeniseWorthen thank you for looking into this some more. I apologize that run directory is out of date -- my most recent setup is here: /scratch2/BMC/gsienkf/Kristin.Barton/stmp/stmp2/Kristin.Barton/FV3_RT/sample_cice_run_25Mar2025 (which does not include the aoflux coupling mode anymore).

I have also been looking into the issue with the ATM->ICE mapping as well. For example, I noticed these incorrect values being sent to the ice model over ocean cells.

We are trying to understand where to go next. Could this be a problem with the input ice grid, or a missing setting needed to correctly set up the fractional grids?

1 reply

DeniseWorthen Mar 25, 2025
Maintainer

It looks like your figure for unmapped regions matches what I'm seeing in the dststatus field, great.

You have a large region in the upper-right, which I assume is where you're expecting the CDEPS component to fill in the non-overlapped regions, is that right?

I'm not sure where the other blocks of unmapped points are coming from. I haven't looked yet at what the land-frac is on your ATM grid.

I do think there one problem will be that you need the non-overlapped regions capability like HAFS, but you're using the coupling mode for the global configurations, and that mapping doesn't account for non-overlapped regions. I suspect you're going to need to use the esmFldsExchange_hafs, but add the fields for the ICE model.

EDIT: I don't think the over-lapped regions issue explains the interior blocks of unmapped ATM.

DeniseWorthen · 2025-03-25T22:06:26Z

DeniseWorthen
Mar 25, 2025
Maintainer

@kristinbarton I copied over your second run directory and re-ran, but I compiled w/o the -DCMEPS_AOFLUX=ON . I don't know if this is what you've also used. But when I looked at the mapping, it is much more consistent. The run still fails, but it the unmapped regions appear more reasonable.

0 replies

DeniseWorthen · 2025-03-25T22:40:19Z

DeniseWorthen
Mar 25, 2025
Maintainer

There seems to be real issues w/ the ATM configuration. I added the attribute DumpFields=true to the ATM attributes. It should dump out netcdf files for the export fields. I wasn't sure it would work w/ a regional grid, but I did get at least the grid file diagnostic_FV3_fcstGrid1.nc. That contains the mask and the area---I don't understand the size 88x88--is that the right tile size? And the mask looks really wrong.

I also looked at the dststatus for fields TO the ATM, and they look very wrong. I can see that the mask appears the same as in the diagnostic grid file, but none of it really makes sense.

0 replies

kristinbarton · 2025-03-25T23:34:26Z

kristinbarton
Mar 25, 2025
Author

For the atmosphere, we used the existing tile over the North Pole and adjusted so that it had a halo (matching the halo around the hafs regional domain), leading to a total size of 88x88.

I made plots of the mask from both the current setup with CICE and the atm+ocn only setup (using hafs.mom6 coupling). It is clearly worse in the CICE run. Could this be a result of issues with using the ufs.frac coupling on regional domains after all? I can try to fix up esmFldsExchange_hafs to include ice fields as you mentioned and see if that helps.

0 replies

DeniseWorthen · 2025-03-26T01:04:10Z

DeniseWorthen
Mar 26, 2025
Maintainer

The mask is added to the ATM grid in the addLsmask2grid call in module_fcst_grid_comp.F90. It really shouldn't have anything to do w/ the ICE model being there or not. I don't understand though why it seems to work ATM+OCN only.

Also---I must be missing something. This is a C96 ATM, so Tile3 (over the NP) should be 96x96. If you've added halos, shouldn't it be larger, not smaller?

0 replies

uturuncoglu · 2025-03-26T01:45:26Z

uturuncoglu
Mar 26, 2025

@kristinbarton @DeniseWorthen I am not sure it will help or not but I have similar kind of masking issues with the land component development before. When I used inconsistent input data like static files from one place and orography etc. and initial condition from another (such as mixing v1 vs. v2 input files or getting static files from standalone run inputs but initial condition from coupled run inputs etc.), it was creating garbage or problematic land sea mask in the FV3 side like this. I could not find the real source of it but once I track the issue to the addLsmask2grid call. The issue is disappeared when I used the consistent input files. Anyway, maybe it helps. Happy to help if you want to discuss more.

0 replies

DeniseWorthen · 2025-03-26T12:17:20Z

DeniseWorthen
Mar 26, 2025
Maintainer

Thanks @uturuncoglu. Do you remember if the mask problems you were seeing were as widespread as we see here?

@kristinbarton Can you create a run-directory for just an ATM-OCN case? Or can I run it from the current sandbox in the "usual way" (removing ICE from run sequence, turning off cplice in input.nml etc)

1 reply

uturuncoglu Mar 26, 2025

@DeniseWorthen yes, the mask was completely corrupted in my case too.

kristinbarton · 2025-03-26T15:49:22Z

kristinbarton
Mar 26, 2025
Author

@DeniseWorthen You can see the ATM-OCN case here: /scratch2/BMC/gsienkf/Kristin.Barton/stmp/stmp2/Kristin.Barton/FV3_RT/atm_ocn_only This version is more directly based on HAFS. It uses the following compile options:

-DAPP=HAFS-MOM6W
-DREGIONAL_MOM6=ON
-DCDEPS_INLINE=ON
-DMOVING_NEST=OFF
-D32BIT=ON
-DCCPP_SUITES=FV3_HAFS_v1_gfdlmp_tedmf,FV3_HAFS_v1_gfdlmp_tedmf_nonsst,
    FV3_HAFS_v1_thompson,FV3_HAFS_v1_thompson_nonsst

A handful of settings were also changed from this version to the CICE version to match more closely with the S2S configuration.

To clarify the atmosphere grid, we did not add a halo to the 96x96 grid. Rather, the goal was to set it up so the 96x96 is the version with the halo and 88x88 is the version without. That way we wouldn't need to explicitly add more cells to create a halo. The same grid files are being used for both versions (with and without CICE) so I don't think the grid itself is causing the problem, though there could still be issues with it.

6 replies

kristinbarton Mar 26, 2025
Author

Was this using the ATM+OCN only case or with CICE?

DeniseWorthen Mar 26, 2025
Maintainer

This was the case with CICE.

lisa-bengtsson Mar 26, 2025

@DeniseWorthen Could you point us to your working directory?

DeniseWorthen Mar 26, 2025
Maintainer

Sure...here's the info:

The second run directory Kristen supplied (sample_cice_run_25Mar2025) is in /scratch1/NCEPDEV/stmp4/Denise.Worthen/reg_cice/arct2. I made no changes here other than the job_card account and adding the DumpFields to the ATM and the write_dststatus to the MED.

I then copied that run directory, but replaced the INPUT/oro_data.nc with one from tile3 from a recent run I had of the coupled model. That is /scratch1/NCEPDEV/stmp4/Denise.Worthen/reg_cice/arct2.orotile3. Note this also did not run, but the mask looks uncorrupted.

By uncorrupted I mean that I can at least tell it is the Arctic, but I can also tell that the mask in the diagnostic file is missing all the fractional land points.

kristinbarton Mar 26, 2025
Author

I ran with your oro_data.nc file and also created new halo0 versions to match our test atmosphere grid (see /scratch2/BMC/gsienkf/Kristin.Barton/stmp/stmp2/Kristin.Barton/FV3_RT/sample_cice_run_26Mar2025/INPUT). This is the resulting land mask. It looks more filled in after that, but that might only be because the land_frac data itself looks more filled in. Is there some kind of cutoff fraction for land, or should all cells with any amount of land be included in the land mask?

lisa-bengtsson · 2025-03-26T21:47:38Z

lisa-bengtsson
Mar 26, 2025

@ShanSunNOAA for fractional grid, is there any setting besides frac_grid = .true. and coupling_mode = ufs.frac that needs to be turned on?

2 replies

ShanSunNOAA Mar 27, 2025

@lisa-bengtsson That's all I can think of for now. This setting is in the FV3 model, and the land_frac in the input oro_data is fractional. How does it relate to the CICE6 error?
Can the model run smoothly without the fractional grid?

lisa-bengtsson Mar 27, 2025

The model crashing may still be unrelated to the land mask, but it would be nice to understand why there are unmapped points in places where we have fractional land grid-points. I wanted to make sure we're not missing a configurational namelist option for the fractional grid.

DeniseWorthen · 2025-03-27T12:16:03Z

DeniseWorthen
Mar 27, 2025
Maintainer

I apologize if my use of a tile3 from the global coupled model confused things. I was trying to figure out why the atm-ocn does not have a corrupted land mask and I had noticed that the original oro file from the sandbox and one we use for the global model had different fields. I was wondering if the fact some fields were missing led to the mask corruption. Specifically, there were no variables for lake_frac and lake_depth. The land_frac in the global oro file also contains the land_frac attribute "land_frac is based on 1deg MOM6 bathymetry (min=1.e-4) and lake_frac (integer);".

For the global model, we take the make the mapped mask (MOM ocean mask mapped to ATM) and that is used to create the oro data. So, for example, the 1deg MOM6 mapped to C96 tiles etc.

Is there some sort of analogous process being used to create the oro data file for the regional model?

6 replies

ShanSunNOAA Mar 27, 2025

Thanks for @DeniseWorthen.

Here are the steps for setting up the global land mask in the atmospheric model:

(1) Obtain the fractional ocean mask from MOM6;
(2) Retrieve lake fraction and depth from MODIS and GLDBv3 datasets. For practical reasons, round the lake fraction to an integer;
(3) Assume that ocean and lake do not coexist at a single grid point; if they do, the ocean takes precedence;
(4) Compute land fraction as: land_frac = 1 - max(ocean_frac, lake_frac).

Let me know if you have questions.

kristinbarton Mar 27, 2025
Author

@ShanSunNOAA Thank you for this guide. Can you point to the MODIS and GLDBv3 datasets? Are they on Hera?

DeniseWorthen Mar 27, 2025
Maintainer

For the global model, the cpld_gridgen utility produces the mapped ocean mask. That certainly does not work currently for regional grids. Once that mapped ocean mask is created, the ufs-utils called ocean_merge does the 2,3,4 I believe. The required modis etc I assume are available in the fix-files for ufs-utils (?).

ShanSunNOAA Mar 27, 2025

@kristinbarton As Denise mentioned, I found the lake related data on Hera at /scratch1/NCEPDEV/global/glopara/fix/orog/20240917/. Please refer to the global UFS_UTILS for guidance on reading these data.

kristinbarton Mar 27, 2025
Author

Thank you, both. I will test this out and update on how it goes.

kristinbarton · 2025-04-01T23:10:00Z

kristinbarton
Apr 1, 2025
Author

I have done some testing with the orographic data, but I haven't been able to improve anything yet. The current result of the run with the new datasets is in /scratch2/BMC/gsienkf/Kristin.Barton/stmp/stmp2/Kristin.Barton/FV3_RT/test_new_oro_data_01Apr2025. I used an up-to-date version of the code so that I could print the dststatus with the current compilation parameters being:

-DAPP=S2S 
-DDEBUG=ON 
-DREGIONAL_MOM6=ON 
-DCDEPS_INLINE=ON 
-DMOVING_NEST=OFF 
-D32BIT=ON  
DCCPP_SUITES=FV3_HAFS_v1_gfdlmp_tedmf,FV3_HAFS_v1_gfdlmp_tedmf_nonsst,FV3_HAFS_v1_thompson,FV3_HAFS_v1_thompson_nonsst,FV3_GFS_v17_coupled_p8,FV3_GFS_v17_coupled_p8_ugwpv1

A few notes about the process of mapping the ocean mask data:

I tried to follow the remapping used in cpld_gridgen as closely as possible, but I could not use the conservative method to a regional grid over the pole, so I tried bilinear instead.
Our starting land/ocean mask is not fractional, so the final land_frac was also just binary 1/0.
The atmosphere grid extends outside of the ocean domain, so I filled those regions in with the pre-existing data from oro_data.nc.
In the oro_data.nc file, I replaced the following variables with the newly generated ones: land_frac, lake_frac, lake_depth, and slmsk. Everything else remained the same.

The land mask output in the diagnostic file looks similar to the land fraction, but with more ocean and fewer land cells.

As mentioned before, I use the C96 as a "halo4" grid and carve out the inner cells to make the "halo0" version of the grid. I also use the same process to get the halo0 version of the oro_data files. When I carve out the oro_data centered with equal cells removed on all sides, the resulting mapping looks shifted in both dimensions:

Shifting the region I carve out by 1 cell in both dimensions seems to improve it:

However, no amount of shifting could actually get rid of the problem regions. I also noticed that the oro_data did not have the same lat/lon values as the C96 grid. I don't know if this matters since fixing them to match did not change the results at all. I suspect either something in my process is making the orography files inconsistent, or another step is still missing.

0 replies

DeniseWorthen · 2025-04-02T12:25:10Z

DeniseWorthen
Apr 2, 2025
Maintainer

@kristinbarton Two questions

What error did you get when you tried conservative mapping of the ocean mask?
Can you try using coupling_mode = ufs.nfrac (ufs.configure) and frac_grid=.false (input.nml)

2 replies

kristinbarton Apr 2, 2025
Author

I did not get an error during the weight generation. The output was just incorrect. Here are the two different results:

(note the ocean grid is much smaller than the atmosphere, so this only shows the mappable portions. That is why previously I filled in the outer cells with existing oro_data.nc values)

This is the command I used to generate weights. The only difference between the two is conserve vs. bilinear.

    ESMF_RegridWeightGen \
    -s mom6_ncgrid.SCRIP.280325.nc \
    -d C96_grid.tile7.nc  \
    -w conserve_weights.nc \
    -i \
    -r \
    -m conserve \
    --netcdf4

Using the suggested settings did change the mapping, but I still get the same error at runtime. The mapping is also not showing the corner where the atmosphere does not overlap with ocean, so I'm not sure what that implies.

DeniseWorthen Apr 2, 2025
Maintainer

Wow, that's a pretty interesting plot using conservative....not at all sure what is going on there!

The non-overlapped regions are now getting filled in using the nstod. It's probably not how you want to run it, but I'm surprised that it still gets the seg-fault (?). I expected the seg-fault was due to having non-valid values mapped to the ICE grid---areas where you show in your previous dststatus map having a value of 1.0 (purple).

DeniseWorthen · 2025-04-04T12:37:21Z

DeniseWorthen
Apr 4, 2025
Maintainer

@kristinbarton Did you ever try setting up a configuration using DATM+MOM+CICE for your domain?

2 replies

NickSzapiro-NOAA Apr 4, 2025
Maintainer

I think we can also make a DATM+CICE using CICE's internal slab ocean model, fwiw

kristinbarton Apr 4, 2025
Author

@DeniseWorthen I haven't tried to use the DATM configuration yet, but I have been going back to my working FV3+MOM6 test case to see exactly which changes cause issues. Some things I have noticed:

When I use the original atm+ocn test case that we had working (Arctic FV3 + Arctic MOM6, code from commit f3ce169), and change just the coupling mode from hafs.mom6 to ufs.frac, it leads to a segfault in the cloud microphysics around the same point where the CICE version also failed.

I was wondering if this could be related to either mismatched settings, missing variable inputs from the initial/lateral boundary files, or if it is just an error with using the ufs coupling mode for a regional setup.
I also tried to setup the FV3+MOM6 test case with a newer version of the code (updated to commit 6ece76a), but now it fails with these error lines in the PET files:

PET238 /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-nnuwc5zlpvogeiuk3nec26eryjiwsopw/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1816 ESMCI::TraceEventRegionExit() Wrong argument specified  - Trace regions not properly nested exiting from region: [ESMF] Expected exit from: [wrtComp_01] RunPhase1
PET238 /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/cache/build_stage/spack-stage-esmf-8.6.0-nnuwc5zlpvogeiuk3nec26eryjiwsopw/spack-src/src/Infrastructure/Trace/src/ESMCI_Trace.C:1258 ESMCI::TraceClose() Wrong argument specified  - Internal subroutine call returned Error
PET238 ESMF_Trace.F90:102 ESMF_TraceClose() Wrong argument specified  - Internal subroutine call returned Error

It mentions nesting, so I’m not sure if this error is related to the HAFS moving nest feature. I did explicitly set -DMOVING_NEST=OFF to turn this feature off.

On the other hand, if I instead use the original North America regional grid for the atmosphere (from the HAFS testcase we initially worked from), then the newer commit does run, so I do think this particular issue is in the Arctic atmosphere setup. The only thing changed between this and the previous version that fails is switching the atmosphere grid / input files.
I also tested: 1) the working FV3+MOM6 case with new oro data/ufs.nfrac and 2) the original HAFS grids (with original HAFS oro data files) using ufs.frac. Both attempts also resulted in "ESMF_Finalize: Error closing trace stream"

Overall, it seems to me that the root of the issue is in the Arctic atmosphere setup and in the use of the ufs.frac/ufs.nfrac coupling modes.

kristinbarton · 2025-04-17T23:39:49Z

kristinbarton
Apr 17, 2025
Author

I have a bit more information after doing further testing. I set up two runs to see the impact of the atmosphere setup: a DATM+OCN+CICE run as suggested, and a global ATM with regional OCN+ICE. Interestingly, both of the runs fail at the same spot: a floating point exception at line 641 in ice_import_export.F90. This is also where I saw it failing with the regional ATM setup. The DATM run is located in /scratch2/BMC/gsienkf/Kristin.Barton/stmp/stmp2/Kristin.Barton/FV3_RT/datm_ocn_ice_08Apr2025 and the global ATM run is in /scratch2/BMC/gsienkf/Kristin.Barton/stmp/stmp2/Kristin.Barton/FV3_RT/arctic_hafs_to_global_16Apr2025 (Hera).

One thing I noticed is that the hafs coupling uses bilinear mapping, while the ufs coupling uses conservative mapping. Given the issues I had before with the conservative mapping onto the Arctic Ocean grid, I was wondering if this could be the source of the problem. Below I share plots of the mapping from atm->ocn for both coupling modes. The hafs coupling looks okay. Only the ufs mapping has the incorrect spots.

Also, one more question I have is whether the red line across the ufs coupling plot is significant at all?

Lastly, I mentioned in the previous post that I was having issues with an "Error closing trace stream" when running with newer versions of the code. I tracked this down to being an issue with using cubed_sphere_grid output on the custom Arctic grid. The error only occurs after the fix introduced in commit f7d8b0c, so I assume that my custom regional grid is no longer compatible with that output method.

0 replies

kristinbarton · 2025-04-24T17:25:52Z

kristinbarton
Apr 24, 2025
Author

@DeniseWorthen We seem to have finally figured out a way to get it running.

We set omp_num_threads to 1 for all models, which progressed us past the ice_import_export.F90 error we kept seeing.
We ran with coupling_mode = ufs.nfrac.aoflux to get past the atmosphere mapping issues. I did try to use just ufs.nfrac, but got the error "Invalid argument - key does not exist: aofrac".

It still isn't clear to me whether we should try to get the fractional version running, or if we can stick with non-fractional.

Also, there is a noticeable line along the longitude seam in most of the ice data outputs. We're not sure if this is a problem to worry about.

1 reply

DeniseWorthen Apr 24, 2025
Maintainer

@kristinbarton It looks like you're making progress. I'm really tied up w/ other things right now so I haven't had a chance to come back to this and I don't want to steer you wrong (by not having time to think this through).

Using nfrac.aoflux means that CMEPS is calculating the A-O fluxes. We only use that mode with DATM, unless you're also using export USE_MED_FLUX=.true.. That's a bit of a unique set-up, where we calculate the fluxes in CMEPS using CCPP as a host model and send them to the ATM. It was meant as a test case for implementation of the X-grid.

If you're not setting med_flux true, then I wonder what the ATM is using for fluxes over the ocean? In CESM for example, CMEPS would send what it calculates as AO fluxes back to the ATM. That is how the AO flux calculation is meant to work. But in our case, we assume that the ATM is calculating the AO fluxes. So even if CMEPS sends them, I don't think they're used unless ATM is expecting to "use_med_flux".

For the line along 180, have you looked at what the T_bot sent from the ATM looks like using the mediator history?

CICE6 error when trying to set up regional Arctic configuration #2657

Uh oh!

Notes on Configuration:

Error Message:

Other details:

Replies: 19 comments · 21 replies

Uh oh!

DeniseWorthen Mar 20, 2025 Maintainer

Uh oh!

kristinbarton Mar 21, 2025 Author

Uh oh!

DeniseWorthen Mar 21, 2025 Maintainer

Uh oh!

Uh oh!

DeniseWorthen Mar 25, 2025 Maintainer

Uh oh!

Uh oh!

kristinbarton Mar 25, 2025 Author

Uh oh!

Uh oh!

DeniseWorthen Mar 25, 2025 Maintainer

Uh oh!

DeniseWorthen Mar 25, 2025 Maintainer

Uh oh!

Uh oh!

DeniseWorthen Mar 25, 2025 Maintainer

Uh oh!

kristinbarton Mar 25, 2025 Author

Uh oh!

Uh oh!

DeniseWorthen Mar 26, 2025 Maintainer

Uh oh!

Uh oh!

DeniseWorthen Mar 26, 2025 Maintainer

Uh oh!

Uh oh!

kristinbarton Mar 26, 2025 Author

Uh oh!

kristinbarton Mar 26, 2025 Author

Uh oh!

DeniseWorthen Mar 26, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

DeniseWorthen Mar 26, 2025 Maintainer

Uh oh!

kristinbarton Mar 26, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DeniseWorthen Mar 27, 2025 Maintainer

Uh oh!

Uh oh!

kristinbarton Mar 27, 2025 Author

Uh oh!

DeniseWorthen Mar 27, 2025 Maintainer

Uh oh!

Replies: 19 comments 21 replies

DeniseWorthen
Mar 20, 2025
Maintainer

kristinbarton
Mar 21, 2025
Author

DeniseWorthen
Mar 21, 2025
Maintainer

DeniseWorthen
Mar 25, 2025
Maintainer

kristinbarton
Mar 25, 2025
Author

DeniseWorthen Mar 25, 2025
Maintainer

DeniseWorthen
Mar 25, 2025
Maintainer

DeniseWorthen
Mar 25, 2025
Maintainer

kristinbarton
Mar 25, 2025
Author

DeniseWorthen
Mar 26, 2025
Maintainer

DeniseWorthen
Mar 26, 2025
Maintainer

kristinbarton
Mar 26, 2025
Author

kristinbarton Mar 26, 2025
Author

DeniseWorthen Mar 26, 2025
Maintainer

DeniseWorthen Mar 26, 2025
Maintainer

kristinbarton Mar 26, 2025
Author

DeniseWorthen
Mar 27, 2025
Maintainer

kristinbarton Mar 27, 2025
Author

DeniseWorthen Mar 27, 2025
Maintainer