Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MARBL-DART #701

Closed
wants to merge 21 commits into from
Closed

MARBL-DART #701

wants to merge 21 commits into from

Conversation

robin-armstrong
Copy link
Collaborator

@robin-armstrong robin-armstrong commented Jul 11, 2024

Description:

Contains model-mods for data assimilation with the Marine Biogeochemistry Library (MARBL/MOM6).

also fixes #614

Testing Datasets:

A test package for MARBL_joint_estimation can be found at /glade/u/home/rarmstrong/DART_test_package.

@hkershaw-brown
Copy link
Member

hkershaw-brown commented Jul 15, 2024

Thanks for this Robin!

For reviewers, summary of July 11th 2024 meeting as todo list:

model mom6-marbl:

  • MARBL_joint_estimation is the version of the model directory that should be released. (rename to something sensible)
  • remove other MARBL_ model directories
    • MARBL_param_estimation
    • MARBL_state_estimation
  • documentation
    • options to set in marbl-mom6 to do DA.
    • what is available (parameter estimation?) models/MARBL_param_estimation has metadata.txt is this needed?
  • model_mod name

observation converter(s):

  • check the difference between these:
    • BATS
    • BATS_clim
  • documentation
    • observations/obs_converters/BATS_clim/bats_climatology.py
    • converter documentation .rst

General git tidiness:

  • remove netcdf files (may remove from history if they are large)
  • remove binary files (may remove from history if they are large)
  • revert mpi/fix system changes
  • update to main
  • rewrite history so Robin's commits are credited to his github account
  • .gitignores dotted around, remove or move to root/.gitignore

Other:

Testing:

  • test case /glade/work/hkershaw/DART/marbl/DART_test_package (copy of /glade/u/home/rarmstrong/DART_test_package edit: without symlinks).
    /glade/u/home/rarmstrong/DART_test_package.robin is the original with symlinks (watch out for links INPUT->INPUT)
  • netcdf dimension fix

backup of this branch since we're rewriting history:
https://github.com/hkershaw-brown/DART/tree/marbl-dart

@hkershaw-brown hkershaw-brown self-assigned this Jul 15, 2024
…n models/MARBL_MOM6_1D in branch marbl_mom6_1d) and MARBL_joint_estimation (developed in models/MARBL_MOM6_1D on branch param_estimation).
…imation branch. Also added in the observation codes for BATS variables, which were accidentally left out of the last commit.
… This required fixing a bug in direct_netcdf_mod, using a solution provided by Helen Kershaw.
…d. Split the BATS data converter into two separate converters, one for sequential state or state-parameter estimation, and another for pure parameter estimation.
… cycles with MARBL without errors. Further testing needed to assess the quality of assimilations.
…imatology.py now records observation error in units of standard deviation, and the script also generates a surrogate netCDF file containing the BATS empirical climatology.
…t, so that it works with the self-contained test package.
Copy link
Member

@hkershaw-brown hkershaw-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first batch of review comments

! QTY_MICROWAVE_BRIGHT_TEMP
! QTY_MICROZOOPLANKTON_CARBON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_quantities_mod is missing many of the new ocean qtys (ocean_quantities_mod.f90)

! QTY_DISSOLVED_INORGANIC_SIO3
! QTY_MESOZOOPLANKTON_CARBON
! QTY_MICROZOOPLANKTON_CARBON
! QTY_PARAM_AUTOTROPH1_KCO2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the goal of having PARAM in the QTY name? Just to communicate that it is a parameter?

Do you need a qty for each parameter? Could you have QTY_PARAMETER?

Copy link
Contributor

@mgharamti mgharamti Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to differentiate this param from other model parameters, we can call it QTY_BGC_PARAM so that it's only a biogeochemistry parameter. I don't think we need a quantity for every single parameter. If I am not mistaken, all MARBL parameters have the same physical properties, for instance >0.

Copy link
Collaborator Author

@robin-armstrong robin-armstrong Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, I created multiple QTY_PARAM_... types in order to facilitate experiments where multiple parameters are being estimated at once. For experiments of this type, we need a way to distinguish one parameter from another, although they do indeed have the same physical properties. That said, we did eventually switch to only running with one parameter being estimated at a time, and for experiments of that sort, having a single QTY_BGC_PARAM would work just fine.

Comment on lines +86 to +87
real(r8), parameter :: geolon = 360 - 64.0
real(r8), parameter :: geolat = 31.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are hardcoded, should these be namelist options for people to put the column anywhere on Earth?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason to not make them namelist options, but a user will need to know that any mismatch between the location in the obs-seq file and geolat/geolon will cause DART to reject the observation. I think that was our motive for hard-coding those coordinates. We wanted to make sure that they always agreed with the coordinates written in the obs-seq file, which in turn are hard-coded into the obs converter.

Comment on lines +137 to +138
! Print module information to log file and stdout.
call register_module(source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! Print module information to log file and stdout.
call register_module(source)

type(time_type) :: mom6_time
integer :: mom_base_date_in_days, mom_days

mom_base_date_in_days = 139157 ! 1982 1 1 0 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always Jan 1st 1982?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless changes have been made to the MARBL date/time format, yes, it's always Jan 1st 1982.


character(len=*), parameter :: routine = 'read_ocean_geometry'

! Need nx, ny
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
! Need nx, ny

I think this comment is from the MOM6 model_mod where you need nx, ny for the size of the basin_depth variable. The column marble is (2,2) always.

Comment on lines 171 to 178
&preprocess_nml
input_obs_def_mod_file = '/glade/derecho/scratch/rarmstrong/marbl_dart/DART/observations/forward_operators/DEFAULT_obs_def_mod.F90'
output_obs_def_mod_file = '/glade/derecho/scratch/rarmstrong/marbl_dart/DART/observations/forward_operators/obs_def_mod.f90'
input_obs_qty_mod_file = '/glade/derecho/scratch/rarmstrong/marbl_dart/DART/assimilation_code/modules/observations/DEFAULT_obs_kind_mod.F90'
output_obs_qty_mod_file = '/glade/derecho/scratch/rarmstrong/marbl_dart/DART/assimilation_code/modules/observations/obs_kind_mod.f90'
obs_type_files = '/glade/derecho/scratch/rarmstrong/marbl_dart/DART/observations/forward_operators/obs_def_ocean_mod.f90'
quantity_files = '/glade/derecho/scratch/rarmstrong/marbl_dart/DART/assimilation_code/modules/observations/default_quantities_mod.f90', '/glade/derecho/scratch/rarmstrong/marbl_dart/DART/assimilation_code/modules/observations/ocean_quantities_mod.f90'
/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs changing to relative paths (within DART) rather than pointing at Robin's directories.

@@ -0,0 +1,241 @@
&perfect_model_obs_nml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need qceff namelists in this file.

@@ -0,0 +1,2 @@
This folder contains files adapted from those in DART/observations/obs\_converters/text, for the purpose of converting data from the [Bermude Atlantic Time-Series Study](https://bats.bios.asu.edu/) (BATS) into a format readable by DART.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to .rst file for online docs

@@ -0,0 +1,2 @@
This folder contains files adapted from those in DART/observations/obs\_converters/text, for the purpose of converting data from the [Bermude Atlantic Time-Series Study](https://bats.bios.asu.edu/) (BATS) into a format readable by DART.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to .rst for online docs.
Description needed for BATS_clim vs BATS

Copy link
Contributor

@mgharamti mgharamti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first rundown of the code. I have not run the code but I plan to do that soon.

ret = nf90_inquire(ncfile_in, unlimitedDimID=unlim_dimID)
call nc_check(ret, 'read_variables: nf90_inquire', 'unlimitedDimID')

if (has_unlimited_dim(domain) .and. any(get_io_dim_ids(domain, i) == unlim_dimID )) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a marbl-only issue? Do we need this chunk of code in other models? If so, I'm wondering if we should have a separate issue for this. Perhaps this is a question to @hkershaw-brown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup it is an issue already #614
It affects models which have some state variables with an unlimited dimension and some without.

! QTY_DISSOLVED_INORGANIC_SIO3
! QTY_MESOZOOPLANKTON_CARBON
! QTY_MICROZOOPLANKTON_CARBON
! QTY_PARAM_AUTOTROPH1_KCO2
Copy link
Contributor

@mgharamti mgharamti Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to differentiate this param from other model parameters, we can call it QTY_BGC_PARAM so that it's only a biogeochemistry parameter. I don't think we need a quantity for every single parameter. If I am not mistaken, all MARBL parameters have the same physical properties, for instance >0.

update_list = update_var_list(1:nfields))

! setting up the DART parameter vector
if(estimate_params) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that the state is always being estimated. This is where we can generalize this and estimate:

  • only state
  • only parameters
  • or both

Perhaps we can visit this later, but just a note that we can come back to.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I was developing the parameter-only version, but as I remember, that setup still needs a separate domain for state variables. The reasoning here was that the forward operator in the parameter-only case involves running MARBL to produce climatological averages, and this operation cannot be coded into DART. Maintaining a separate domain for state variables allows us to compute those averages externally, so that DART can read them as identity observations from a "climatology" component of the state vector. That said, it might be better to rename that component within the model-mod, so that instead of state_dom_id we have something like climatology_dom_id.

!------------------------------------------------------------------
! Returns the number of items in the state vector as an integer.

function get_model_size()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very clean, especially if the parameter domain is not active. I would count the domains and loop over them to get the model_size (which is by the way defined on line 84 but not used). I suggest a code similar to the one below:

domain_count = get_num_domains()
model_size = 0
do i_dom = 1,domain_count
     model_size = model_size + get_domain_size(i_dom)
enddo

This is related to my previous comment on active domains and the nature of the DA configuration

! reading the clamp values

if (table(i, 3) /= 'NA') then
read(table(i,3), '(d16.8)') clamp_vals(i,1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the format specification needed here? What if people added the clamps in a different format?

@@ -0,0 +1,591 @@
! DART software - Copyright UCAR. This open source software is provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we agreed that we will only do the joint state-parameter estimation one. I would stick to one marbl directory and one model mod for now. These could be consolidated into one in the future. I mean something that comes to mind now is to do no_copy_back on the state variables to turn it into a parameter estimation only.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we did agree to only do the joint state-parameter estimation model-mod. I had all three model-mods being developed on the same branch to make it easier to switch between them, which is why they're all in the same pull request; I apologize if that's bad form. In any case, MARBL_joint_estimation is the one that we agreed to pull into DART.

was pointing at Robin's directory
@hkershaw-brown
Copy link
Member

hkershaw-brown commented Jul 19, 2024

note for test package, I switched to inflation not read from file

hkershaw@dec1004:/glade/work/hkershaw/DART/marbl/DART_test_package/test/DART$ diff input.nml input.nml.robin 
1,6d0
< &probit_transform_nml
< /
< 
< &algorithm_info_nml
< /
< 
73,74c67,68
<    inf_initial_from_restart    = .false.,                 .false.,
<    inf_sd_initial_from_restart = .false.,                  .false.,
---
>    inf_initial_from_restart    = .true.,                 .false.,
>    inf_sd_initial_from_restart = .true.,                  .false.,
108a103
>    filter_kind                     = 1,

@mgharamti
Copy link
Contributor

I checked out the code and I was not able to run any of the DART executables. Here is the error that I was getting:

ERROR FROM:
  source : ensemble_manager_mod.f90
  routine: set_up_ens_distribution
  message: not enough MPI tasks for the model size, suggest at least ***** tasks

So, it turns out the length of the state vector is 7036994676854122!!
I realize the issue is in computing the variable model_size. Right now, the function get_model_size in the model_mod is computing it using local variables. When someone calls get_model_size() outside the model_mod, this fails. So, the correct approach is to compute model_size inside static_init_model as I outlined in my previous review and then get_model_size should simply look like this:

function get_model_size()
integer(i8) :: get_model_size
get_model_size = model_size 
end function get_model_size

With this change, I can now run model_mod_check and the model_size is now correctly computed as 3434.
It's still puzzling to me that this version of the code was running during Robin's internship. Could it be that we have a different version?

MARBL_joint_estimation is the model_mod we are using (see pull NCAR#701)
Copy link
Contributor

@mgharamti mgharamti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to run filter and state and parameter updates. The only necessary change that is needed to run filter properly is the handling of model_size.

&filter_nml
single_file_in = .false.,
input_state_files = '',
input_state_file_list = 'states_input.txt', 'params_input.txt',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set estimate_params = .false. in the model_nml, the code will still look for a list of parameter files. I guess that's ok given that this is a joint state-parameter estimation routine. This can be easily fixed once we work on merging the 3 different filtering strategies together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set estimate_params = .false. in the model_nml, the code will still look for a list of parameter files. I guess that's ok given that this is a joint state-parameter estimation routine. This can be easily fixed once we work on merging the 3 different filtering strategies together.

merging the 3 different filtering strategies is happening as part of this pull request, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helen, I'm not quite sure. I'm happy to do it but it would take some time as I'm leaving. I say, we push the joint one now after small corrections. Once I'm back, I can get to it and add flexibility to perform the 3 using one code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit to do before releasing on this code. I'll go ahead an close the pull request. We can reopen it when it its ready.

@hkershaw-brown
Copy link
Member

pushed robins branch to https://github.com/NCAR/DART/tree/marbl-dart

@hkershaw-brown
Copy link
Member

pushed robins branch to https://github.com/NCAR/DART/tree/marbl-dart
deleting this branch, since Moha's pull request from his fork:

https://github.com/mgharamti/DART/tree/marbl-dart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: assuming all state variables have unlimited dimension
3 participants