-
Notifications
You must be signed in to change notification settings - Fork 167
ARVOR Floats and SVP Drifters Ocean Converters #1009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This update introduces a new set of general-purpose CSV utilities to `parse_args_mod` for use across DART observation converters and other modules that ingest ASCII/tabular data. New utilities added: - `csv_file_type`: cached CSV handle storing filename, nrows, ncols, delimiter, and header fields. - `csv_open`/`csv_close`: initialize/reset CSV handle and preload header/dimensions. - `csv_get_field_char` - `csv_get_field_int` - `csv_get_field_real` Unified interface through `csv_get_field` for retrieving column strings, integers, or reals. - Normalization of delimiters (, or ;) with support for empty fields. - `csv_get_obs_num`: count data rows (excluding header) - `csv_find_field`: header lookup - Other internal helpers such as `split_fields`, `detect_delim`, `normalize_delims` These routines provide a reusable framework that is modeled after our existing NetCDF utilities.
A new ocean converter that uses profiling floats. The converter harvests temperature and salinity data at different depths and time. Depths are converted from pressure in dbar to height in meters. The converter uses the csv parsing utilities to read data from the raw input files.
This is an ocean conveter that uses surface drifters. It collects SST and surface currents data. It uses the csv parsing utilities to read the incoming ASCII files.
- `csv_get_field_index`: Get column index of a field - `csv_field_exists`: Check if field exists in file - `csv_print_header`: print the field names (my favorite) Additional debugging statements in the converters
|
moha - i'm going to file a review on the code in just a bit, but up front i wanted to say that it's great to pull out the CSV parsing into a module so it can be reused, tested and updated independently of the calling code. if you were willing to do a bit more work on this, i think that the CSV routines are self-contained enough to merit their own separate module. they can call code from the parse module, but i think they're different enough to stand alone. let me know what you think about this. i'll put other more specific comments into my review. also - do you have any tests you used on this code that could be added to the repo? |
nancycollins
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the converters themselves are easy to read and understand, which is good. i had a few comments - the biggest one is probably moving the csv routines to their own module.
| integer :: ncols = 0 | ||
| character :: delim = ',' | ||
| character(len=512) :: fields(MAX_NUM_FIELDS) | ||
| logical :: is_open = .false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment in csv_open() about leaving the file open until csv_close is called. if you aren't going to change that behavior then this variable needs a better name. but if you open the file and leave it open, this name then is perfect as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed this, see below
| cf%delim = detect_delim(line) | ||
|
|
||
| call split_fields(line, cf%delim, cf%ncols, cf%fields) | ||
| call close_file(iunit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would leave the file open here in csv_open(), leave it open in all subsequent calls, and close it in csv_close(). you can add the iunit to the same structure and reuse it until close is called. you can call "rewind()" if you need to start reading at the beginning of the file in subsequent calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| if (io /= 0) then | ||
| write(string1, '(A, I0)') 'Got bad read code from input file, io = ', io | ||
| call error_handler(E_ERR, routine, string1, context) | ||
| call close_file(iunit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick - i'd move this close_file() call up 2 lines, before the write statement. in most cases the call to the error_handler() isn't going to return, so the close_file() call won't execute. it doesn't matter if the program is exiting since all open file handles will be closed by the system, but i think it's clearer to have less code after the error_handler() call to not imply that execution continues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| cf%ncols = 0 | ||
| cf%delim = ',' | ||
| cf%fields = '' | ||
| cf%is_open = .false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you add iunit to the cf structure, close cf%iunit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| ! Detect the delimiter of a CSV file. One should expect a ',' | ||
| ! but certain data files can come with a ';' so | ||
| ! let's take care of both cases | ||
| function detect_delim(line) result(delim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add an optional argument to this function that sets the delimiter if the calling code knows what it should be. if the argument isn't present, then you can try to detect it like you do in the code already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| file_out = 'obs_seq.arvor', | ||
| obs_error_temp = 0.02, ! temperature error standard deviation (C) | ||
| obs_error_sal = 0.02, ! salinity error standard deviation (PSU) | ||
| avg_obs_per_file = 500000, ! pre-allocation hint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd say this is more than a 'hint' because i don't see anywhere that the converter can recover if there are more obs than were originally allocated for. maybe use 'limit' instead of 'hint'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| * - ``avg_obs_per_file`` | ||
| - integer | ||
| - ``500000`` | ||
| - Estimated number of valid observations per input file. Used only for pre-allocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe second sentance should be something like 'Used for pre-allocation. Number of files times this number must be larger than the total number of output observations.'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| * - ``avg_obs_per_file`` | ||
| - integer | ||
| - ``500000`` | ||
| - Estimate of valid obs per file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a second sentence something like 'Used for pre-allocation. Number of files times this number must be larger than the total number of output observations.'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| file_out = 'obs_seq.svp', ! output obs_seq file | ||
| obs_error_sst = 0.20_r8, ! SST error (C) | ||
| obs_error_vel = 0.10_r8, ! U/V error (m/s) | ||
| avg_obs_per_file = 500000, ! pre-allocation hint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe 'limit' rather than 'hint'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| ! Open csv file and get dims | ||
| call csv_open(filename, cf, routine) | ||
| nobs = cf%nrows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto the comment about an accessor function here. i think this is the only one missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Nancy, thanks for the review. I should be able to address all of the comments. I'll also move the routines to their own module as suggested. The data I used for testing can be found here:
These are the same as those listed in ops_files.txt (in my PR description). Do you want me to add some of those ascii files to the repo? |
|
hi moha - thanks. no, i don't think they need to be added to the repo. i just wanted to see some of the input files so maybe i could make a couple of simple test programs that mimic what the read routines are expected to parse. |
|
i made a small test program and pushed it to my fork of your code here: https://github.com/nancycollins/moha/tree/insitu_ocean_converters/developer_tests/utilities it's called csv_read_test.f90 (and a corresponding update to work/quickbuild.sh). i think it should be added to your pull request but i'm rusty with github so i left it there in my repo. it works fine in a couple test cases but the csv field read code doesn't cope correctly with embedded blanks in data fields (test 3 fails). |
|
i went back to the parse_args_mod and made a new routine get_csv_words_from_string() which must be passed a string and a delimiter character, and it returns a word count and word array. it handles embedded blanks and quoted fields so they can contain the delimiter character inside the field. i pushed this to my fork and also added a parse_csv_test.f90 and parse_csv_test.in test for this. |
Stripped all csv routines from the parse_args_mod and added them into their own csv module. Improved the opening and closing logic. Now, the file is opened once and rewinded for reading different variables. Content of csv file structure is now private. Added necessary accessor functions to retrieve data.
Cleaned up parse_args_mod and slightly modified the new converters codes to use the new read_csv_mod. Also, made small readme changes.
|
Nancy's pull request to Moha's pull request is here: |
Description:
This PR adds two new in-situ ocean converters (ARVOR profiling floats and SVP surface drifters). It also introduces a reusable CSV parsing utility in
parse_args_mod. Both converters make use of this CSV interface, which simplifies code. In addition, documentation has been added to the converters.The CSV parsing utilities build on already existing parsing infrastructure (like a wrapper). The functionality mimics our netcdf handling in the sense that a file is opened, and data is accessed with a single call before closing the file. A few helper functions have been also added. These can be used to access the header, to inquire if a field exists, to find the dimensions, etc.
Types of changes
Documentation changes needed?
Tests
Tested both converters using actual raw ASCII data files.
Checklist for merging
Checklist for release
Testing Datasets
ARVOR:
/glade/derecho/scratch/gharamti/inacawo/DART/observations/obs_converters/ARVOR/work/obs_files.txtSVP:
/glade/derecho/scratch/gharamti/inacawo/DART/observations/obs_converters/SVP/work/obs_files.txt