Skip to content

Conversation

@mgharamti
Copy link
Contributor

Description:

This PR adds two new in-situ ocean converters (ARVOR profiling floats and SVP surface drifters). It also introduces a reusable CSV parsing utility in parse_args_mod. Both converters make use of this CSV interface, which simplifies code. In addition, documentation has been added to the converters.

The CSV parsing utilities build on already existing parsing infrastructure (like a wrapper). The functionality mimics our netcdf handling in the sense that a file is opened, and data is accessed with a single call before closing the file. A few helper functions have been also added. These can be used to access the header, to inquire if a field exists, to find the dimensions, etc.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Documentation changes needed?

  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.

Tests

Tested both converters using actual raw ASCII data files.

Checklist for merging

  • Updated changelog entry
  • Documentation updated
  • Update conf.py

Checklist for release

  • Merge into main
  • Create release from the main branch with appropriate tag
  • Delete feature-branch

Testing Datasets

ARVOR: /glade/derecho/scratch/gharamti/inacawo/DART/observations/obs_converters/ARVOR/work/obs_files.txt
SVP: /glade/derecho/scratch/gharamti/inacawo/DART/observations/obs_converters/SVP/work/obs_files.txt

This update introduces a new set of general-purpose CSV utilities
to `parse_args_mod` for use across DART observation converters and
other modules that ingest ASCII/tabular data.

New utilities added:
- `csv_file_type`: cached CSV handle storing filename, nrows, ncols, delimiter, and header fields.
- `csv_open`/`csv_close`: initialize/reset CSV handle and preload header/dimensions.
- `csv_get_field_char`
- `csv_get_field_int`
- `csv_get_field_real`
  Unified interface through `csv_get_field` for retrieving column strings, integers, or reals.
- Normalization of delimiters (, or ;) with support for empty fields.
- `csv_get_obs_num`: count data rows (excluding header)
- `csv_find_field`: header lookup
- Other internal helpers such as `split_fields`, `detect_delim`, `normalize_delims`

These routines provide a reusable framework that is modeled after our existing
NetCDF utilities.
A new ocean converter that uses profiling floats.
The converter harvests temperature and salinity data
at different depths and time. Depths are converted
from pressure in dbar to height in meters.

The converter uses the csv parsing utilities
to read data from the raw input files.
This is an ocean conveter that uses surface drifters.
It collects SST and surface currents data. It uses
the csv parsing utilities to read the incoming ASCII files.
- `csv_get_field_index`: Get column index of a field
- `csv_field_exists`: Check if field exists in file
- `csv_print_header`: print the field names (my favorite)

Additional debugging statements in the converters
@mgharamti mgharamti added Enhancement New feature or request obs_converters converting observations to DART format labels Nov 25, 2025
@nancycollins
Copy link
Collaborator

moha - i'm going to file a review on the code in just a bit, but up front i wanted to say that it's great to pull out the CSV parsing into a module so it can be reused, tested and updated independently of the calling code.

if you were willing to do a bit more work on this, i think that the CSV routines are self-contained enough to merit their own separate module. they can call code from the parse module, but i think they're different enough to stand alone. let me know what you think about this. i'll put other more specific comments into my review.

also - do you have any tests you used on this code that could be added to the repo?

Copy link
Collaborator

@nancycollins nancycollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the converters themselves are easy to read and understand, which is good. i had a few comments - the biggest one is probably moving the csv routines to their own module.

integer :: ncols = 0
character :: delim = ','
character(len=512) :: fields(MAX_NUM_FIELDS)
logical :: is_open = .false.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment in csv_open() about leaving the file open until csv_close is called. if you aren't going to change that behavior then this variable needs a better name. but if you open the file and leave it open, this name then is perfect as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed this, see below

cf%delim = detect_delim(line)

call split_fields(line, cf%delim, cf%ncols, cf%fields)
call close_file(iunit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would leave the file open here in csv_open(), leave it open in all subsequent calls, and close it in csv_close(). you can add the iunit to the same structure and reuse it until close is called. you can call "rewind()" if you need to start reading at the beginning of the file in subsequent calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (io /= 0) then
write(string1, '(A, I0)') 'Got bad read code from input file, io = ', io
call error_handler(E_ERR, routine, string1, context)
call close_file(iunit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick - i'd move this close_file() call up 2 lines, before the write statement. in most cases the call to the error_handler() isn't going to return, so the close_file() call won't execute. it doesn't matter if the program is exiting since all open file handles will be closed by the system, but i think it's clearer to have less code after the error_handler() call to not imply that execution continues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

cf%ncols = 0
cf%delim = ','
cf%fields = ''
cf%is_open = .false.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you add iunit to the cf structure, close cf%iunit here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

! Detect the delimiter of a CSV file. One should expect a ','
! but certain data files can come with a ';' so
! let's take care of both cases
function detect_delim(line) result(delim)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add an optional argument to this function that sets the delimiter if the calling code knows what it should be. if the argument isn't present, then you can try to detect it like you do in the code already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

file_out = 'obs_seq.arvor',
obs_error_temp = 0.02, ! temperature error standard deviation (C)
obs_error_sal = 0.02, ! salinity error standard deviation (PSU)
avg_obs_per_file = 500000, ! pre-allocation hint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd say this is more than a 'hint' because i don't see anywhere that the converter can recover if there are more obs than were originally allocated for. maybe use 'limit' instead of 'hint'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* - ``avg_obs_per_file``
- integer
- ``500000``
- Estimated number of valid observations per input file. Used only for pre-allocation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe second sentance should be something like 'Used for pre-allocation. Number of files times this number must be larger than the total number of output observations.'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* - ``avg_obs_per_file``
- integer
- ``500000``
- Estimate of valid obs per file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a second sentence something like 'Used for pre-allocation. Number of files times this number must be larger than the total number of output observations.'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

file_out = 'obs_seq.svp', ! output obs_seq file
obs_error_sst = 0.20_r8, ! SST error (C)
obs_error_vel = 0.10_r8, ! U/V error (m/s)
avg_obs_per_file = 500000, ! pre-allocation hint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe 'limit' rather than 'hint'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


! Open csv file and get dims
call csv_open(filename, cf, routine)
nobs = cf%nrows
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto the comment about an accessor function here. i think this is the only one missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mgharamti
Copy link
Contributor Author

Nancy, thanks for the review. I should be able to address all of the comments. I'll also move the routines to their own module as suggested. The data I used for testing can be found here:
For ARVOR:

  • /glade/work/gharamti/inacawo/data_snippets/arvorc/
  • /glade/work/gharamti/inacawo/data_snippets/arvori/
    For SVP:
  • /glade/work/gharamti/inacawo/data_snippets/svp/20251006/

These are the same as those listed in ops_files.txt (in my PR description). Do you want me to add some of those ascii files to the repo?

@nancycollins
Copy link
Collaborator

hi moha - thanks. no, i don't think they need to be added to the repo. i just wanted to see some of the input files so maybe i could make a couple of simple test programs that mimic what the read routines are expected to parse.

@nancycollins
Copy link
Collaborator

i made a small test program and pushed it to my fork of your code here:

https://github.com/nancycollins/moha/tree/insitu_ocean_converters/developer_tests/utilities

it's called csv_read_test.f90 (and a corresponding update to work/quickbuild.sh). i think it should be added to your pull request but i'm rusty with github so i left it there in my repo. it works fine in a couple test cases but the csv field read code doesn't cope correctly with embedded blanks in data fields (test 3 fails).

@nancycollins
Copy link
Collaborator

i went back to the parse_args_mod and made a new routine get_csv_words_from_string() which must be passed a string and a delimiter character, and it returns a word count and word array. it handles embedded blanks and quoted fields so they can contain the delimiter character inside the field. i pushed this to my fork and also added a parse_csv_test.f90 and parse_csv_test.in test for this.

Stripped all csv routines from the parse_args_mod
and added them into their own csv module.
Improved the opening and closing logic. Now, the
file is opened once and rewinded for reading different
variables.
Content of csv file structure is now private. Added
necessary accessor functions to retrieve data.
Cleaned up parse_args_mod and slightly modified the
new converters codes to use the new read_csv_mod.
Also, made small readme changes.
@hkershaw-brown
Copy link
Member

Nancy's pull request to Moha's pull request is here:
mgharamti#1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement New feature or request obs_converters converting observations to DART format

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants