Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State IO #359

Open
hkershaw-brown opened this issue Jun 2, 2022 · 14 comments
Open

State IO #359

hkershaw-brown opened this issue Jun 2, 2022 · 14 comments
Assignees
Labels
IO IO refactoring notes and issues

Comments

@hkershaw-brown
Copy link
Member

hkershaw-brown commented Jun 2, 2022

Notes for state IO

@hkershaw-brown hkershaw-brown self-assigned this Jun 2, 2022
@hkershaw-brown
Copy link
Member Author

dart_time_io_mod.f90 uses its own local has_unlimited rather than the state_structure_mod domain%has_unlimited

! If there is no unlimited dimension, unlimitedDimID = -1
ios = nf90_inquire(ncid, unlimitedDimId=unlimitedDimId )
call nc_check(ios,routine,'checking unlimited dimension')
has_unlimited = (unlimitedDimID /= -1)

@hkershaw-brown
Copy link
Member Author

In create_and_open_state_output, only 'time' (lower case) can be the unlimited dimension.

! define dimensions, loop around unique dimensions
do i = 1, get_io_num_unique_dims(dom_id)
if ( trim(get_io_unique_dim_name(dom_id, i)) == 'time' ) then
ret = nf90_def_dim(ncfile_out, 'time', NF90_UNLIMITED, new_dimid)
else
ret = nf90_def_dim(ncfile_out, get_io_unique_dim_name(dom_id, i), &
get_io_unique_dim_length(dom_id, i), new_dimid)
endif
!>@todo if we already have a unique names we can take this test out
if(ret /= NF90_NOERR .and. ret /= NF90_ENAMEINUSE) then
call nc_check(ret, routine, &
'defining dimensions'//trim(get_io_unique_dim_name(dom_id, i)))
endif
enddo

If you have an unlimited dimension in the state_structure_mod, e.g. WRF has 'Time' this gets created as a limited dimension.

This means that your created netcdf files have a different netcdf dimension structure than your model files.

has_unlimited is a property of the state_stucture%domain, but it is (can be) different between netcdf files because of create_and_open_state_output. I think this is why dart_time_io_mod is querying the netcdf file rather than the state_structure (see comment above).

There is also domain%variable(ivar)%var_has_unlim which is per variable, set but never used.

@hkershaw-brown
Copy link
Member Author

This is a hard coded 0 however, there is a integer, parameter :: SINGLE_IO_TASK_ID = 0 in the module.

! Broadcast the value of model_mod_will_write_state_variables to every task
! This keeps track of whether the model_mod or dart code will write state_variables.
call broadcast_flag(local_model_mod_will_write_state_variables, 0)
ncFileID%model_mod_will_write_state_variables = local_model_mod_will_write_state_variables

  • I don't think this needs a broadcast, since if all tasks are in the initialize_single_file_io routine, all tasks have access to the local variable local_model_mod_will_write_state_variables
  • double check pe vs task: since initialize_single_file_io(ens_handle, file_handle) should you be using ens_handle%my_pe rather than my_task_id for this routine? Maybe it does not matter. Similarly for read_single_file:
    ! mpi task variables
    my_pe = my_task_id()
    is_sender = (my_pe == SINGLE_IO_TASK_ID)

@hkershaw-brown
Copy link
Member Author

perfect_model_obs has namelist options:

single_file_in
single_file_out

but the number of copies is hardcoded at 1 when initializing the filenames

call io_filenames_init(file_info_input, 1, cycling=has_cycling, single_file=single_file_in)

and the ens_size is fixed at 1:

integer :: ens_size = 1 ! This is to avoid magic number 1s

what is the single_file_in/out for in perfect_model_obs?

@nancycollins
Copy link
Collaborator

we need to rename the file format that puts all the ensemble members plus inflation copies, mean, sd, etc in a single netcdf file (that dart dictates the format of).

we called it "single file" but as you point out that's confusing for the situation where there is only 1 member involved. the code is going to expect to read a netcdf file with specific dimension names and variable names.

other suggestions for this format? combination file, combined file, dart format file, ???

@hkershaw-brown
Copy link
Member Author

I think my question is even more simple than that, do we need single_file_in as a namelist option for perfect_model_obs? I think it only every runs with 1 copy, but am I missing something?

@nancycollins
Copy link
Collaborator

yes, because even though there is always only 1 input and output file, this item toggles on and off the dart netcdf format file vs a model netcdf file.

the namelist variable name is confusing because we called dart format files "single file" no matter how many members are in it.

@hkershaw-brown
Copy link
Member Author

yeah I get it, just thinking about refactoring. Maybe it would be better to have the file describe itself as 'dart format'

@nancycollins
Copy link
Collaborator

if the dart file had a global attribute to indicate it was "dart format" that would be good -- make it more self-describing. then maybe a namelist item wouldn't be needed. that would be nice.

but we'd have to think about how the code could use it. it might need to open the file, look for the attribute and then decide whether to use the state structure setup info from the model's static_init_model or the dart defined format to read the file. i'm not clear if there is an order of things that works however.

it would be nice if the i/o code could use the state structure for reading a dart format file but with all the members in a single netcdf variable i don't know if it can.

@hkershaw-brown
Copy link
Member Author

Diagnostic structure, not used:

! diagnostic files
!>@todo FIXME these routines are deprecated because we are no supporting 'diagnostic'
!> files, but they will likely be useful for the single file (multiple member) input.
public :: create_diagnostic_structure, &
end_diagnostic_structure

@hkershaw-brown
Copy link
Member Author

assert_restart_names_initialized is printing the error message from assert_file_info_initialized

!> Test whether file_info_type has been initialized
!> Error out if not, giving the name of the calling routine.
subroutine assert_file_info_initialized(file_info, routine_name)
type(file_info_type), intent(in) :: file_info
character(len=*), intent(in) :: routine_name
if ( .not. file_info%initialized ) then
call error_handler(E_ERR, routine_name, &
':: io_filenames_init must be used to initialize file_info_type', source)
endif
end subroutine assert_file_info_initialized
!-------------------------------------------------------------------------------
!> Test whether file_info_type has been initialized for routines that only
!> have access the %restart_files(in/out/prior)
!> Error out if not, giving the name of the calling routine.
subroutine assert_restart_names_initialized(restart_names, routine_name)
type(stage_metadata_type), intent(in) :: restart_names
character(len=*), intent(in) :: routine_name
if ( .not. restart_names%initialized ) then
call error_handler(E_ERR, routine_name, &
':: io_filenames_init must be used to initialize file_info_type', source)
endif
end subroutine assert_restart_names_initialized

@hkershaw-brown
Copy link
Member Author

add_domain_blank is not blank (just a state vector) it has 3 dimensions: location, member, time. The are appropriate for lorenz_X style models, but not necessarily appropriate for anything else.

!-------------------------------------------------------------------------------
!> Add a blank domain - one variable called state, length = domain_size
! HK the above comment is not true, there are three dimensions created in this function.
! HK this should set has_unlimited = .true.
function add_domain_blank(domain_size) result(dom_id)
integer(i8), intent(in) :: domain_size
integer :: dom_id
integer(i8) :: domain_offset
! add to domains
call assert_below_max_num_domains('add_domain_blank')
state%num_domains = state%num_domains + 1
dom_id = state%num_domains
if (state%num_domains > 1 ) then
domain_offset = get_index_end(dom_id-1,get_num_variables(dom_id-1))
else
domain_offset = 0
endif
! domain
state%domain(dom_id)%method = 'blank'
state%domain(dom_id)%num_variables = 1
state%domain(dom_id)%dom_size = domain_size
state%model_size = state%model_size + domain_size
state%domain(dom_id)%num_unique_dims = 3
allocate(state%domain(dom_id)%original_dim_IDs(3))
allocate(state%domain(dom_id)%unique_dim_names(3))
allocate(state%domain(dom_id)%unique_dim_length(3))
state%domain(dom_id)%unique_dim_names(1) = 'location'
state%domain(dom_id)%unique_dim_names(2) = 'member'
state%domain(dom_id)%unique_dim_names(3) = 'time'

state%domain(dom_id)%unique_dim_names(1)  = 'location'
state%domain(dom_id)%unique_dim_names(2)  = 'member'
state%domain(dom_id)%unique_dim_names(3)  = 'time'

The 'member' dimension for the single file IO - again blending state_structure & IO (which are not the same, but currently are in the same state_structure).

@hkershaw-brown
Copy link
Member Author

  • diagnostic structure in the code but not used at all half used?
    ! diagnostic files
    !>@todo FIXME these routines are deprecated because we are no supporting 'diagnostic'
    !> files, but they will likely be useful for the single file (multiple member) input.
    public :: create_diagnostic_structure, &
    end_diagnostic_structure

    Note this routines comment says is is depreciated
    ! diagnostic files
    !>@todo FIXME these routines are deprecated because we are no supporting 'diagnostic'
    !> files, but they will likely be useful for the single file (multiple member) input.
    public :: create_diagnostic_structure, &
    end_diagnostic_structure

Then Used with a comment "may not be needed"

Then domain = 1 is hardcoded anyway

! may not be needed
ncFileID%diag_id = create_diagnostic_structure()
my_ncid = ncFileID%ncid
!>@todo ONLY ONE DOMAIN FOR SINGLE FILE OUTPUT
domain = 1
do ivar = 1, get_num_variables(domain)

@hkershaw-brown
Copy link
Member Author

missing_in_state is for some ensemble members missing, dry land (all ens missing missing, e.g. POP) skates though.

Screenshot 2024-09-05 at 4 52 28 PM

note check for missing in state in assim_tools is mucho expensive (30% of runtime)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO IO refactoring notes and issues
Projects
None yet
Development

No branches or pull requests

2 participants