Skip to content

Conversation

hkershaw-brown
Copy link
Member

@hkershaw-brown hkershaw-brown commented Sep 24, 2025

Description:

This is an observation converter for CrocoLake part of the CROCODILE project, contributed by Enrico Milanese, Woods Hole Oceanographic Institution (WHOI)

This converter has been on the CROCODILE-CESM/DART fork. It takes the parquet format data of CrocoLake and converts to observation sequence format.

Note the commits are flattened into one commit here, because we have a bunch of MOM6 assimilate.sh for crocodile that are not ready for DART. If you want more detail see #603 original closed pull request or #920 for details on cesm-hybrid issue, but this pull request is only concerned with the CrocoLake converter.

Fixes issue

Fixes #884

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Documentation changes needed?

  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.

Tests

Please describe any tests you ran to verify your changes.
Run the converter on Derecho, ingest with obs_sequence_tool

Checklist for merging

  • Updated changelog entry
  • Documentation updated
  • Update conf.py

Checklist for release

  • Merge into main
  • Create release from the main branch with appropriate tag
  • Delete feature-branch

Testing Datasets

  • Dataset needed for testing available upon request
  • Dataset download instructions included
  • No dataset needed

Co-authored-by: Marlena Smith <[email protected]>
@mjs2369
Copy link
Contributor

mjs2369 commented Sep 24, 2025

Need to add Crocolake to this doc page - https://github.com/NCAR/DART/tree/crocolake/observations/obs_converters

Side note, we have two separate places in the docs where we list the available converters

@hkershaw-brown
Copy link
Member Author

Need to add Crocolake to this doc page - https://github.com/NCAR/DART/tree/crocolake/observations/obs_converters

Side note, we have two separate places in the docs where we list the available converters

Nice catch Marlee, side note: we write this same side note a lot.

@mjs2369
Copy link
Contributor

mjs2369 commented Sep 24, 2025

I think we should add some information about the arguments that can be passed into ObsSequence either to the examples or the CrocoLake doc page, so our users don't have to go into the source code to get that information

This is the info in the source code:

Arguments:
crocolake_path (str): path to desired crocolake database
selected_vars (list): list of variables to be extracted from the database
db_filters (list): list of db_filters to be applied to the database
fill_na_qc (int): replace value for NA in QC flags
fill_na_error (float): replace value for NA in error variables
obs_seq_out (str): obs_seq file name
loose (bool): if True, store observation values also when
their QC and error are not present (default: False)

@mjs2369
Copy link
Contributor

mjs2369 commented Sep 24, 2025

obs_type_vars = {}
obs_type_vars["TEMP"] = "TEMPERATURE"
obs_type_vars["PSAL"] = "SALINITY"
obs_type_vars["DOXY"] = "OXYGEN"
obs_type_vars["TOT_ALKALINITY"] = "ALKALINITY"
obs_type_vars["TCO2"] = "INORGANIC_CARBON"
obs_type_vars["NITRATE"] = "NITRATE"
obs_type_vars["SILICATE"] = "SILICATE"
obs_type_vars["PHOSPHATE"] = "PHOSPHATE"
return obs_type_vars

There are several more variables in the CrocoLake docs:
https://crocolakedocs.readthedocs.io/en/latest/crocolake.html#variables

I'm guessing these are the only obs types that DART can work with?

@enrico-mi
Copy link
Collaborator

obs_type_vars = {}
obs_type_vars["TEMP"] = "TEMPERATURE"
obs_type_vars["PSAL"] = "SALINITY"
obs_type_vars["DOXY"] = "OXYGEN"
obs_type_vars["TOT_ALKALINITY"] = "ALKALINITY"
obs_type_vars["TCO2"] = "INORGANIC_CARBON"
obs_type_vars["NITRATE"] = "NITRATE"
obs_type_vars["SILICATE"] = "SILICATE"
obs_type_vars["PHOSPHATE"] = "PHOSPHATE"
return obs_type_vars

There are several more variables in the CrocoLake docs: https://crocolakedocs.readthedocs.io/en/latest/crocolake.html#variables

I'm guessing these are the only obs types that DART can work with?

Hi @mjs2369 -- correct, that's the overlap that I found between DART's obs types and CrocoLake's variables.

@hkershaw-brown
Copy link
Member Author

Need to add Crocolake to this doc page - https://github.com/NCAR/DART/tree/crocolake/observations/obs_converters

Side note, we have two separate places in the docs where we list the available converters

fixed in ff0758e

@hkershaw-brown
Copy link
Member Author

I think we should add some information about the arguments that can be passed into ObsSequence either to the examples or the CrocoLake doc page, so our users don't have to go into the source code to get that information

This is the info in the source code:

Arguments:
crocolake_path (str): path to desired crocolake database
selected_vars (list): list of variables to be extracted from the database
db_filters (list): list of db_filters to be applied to the database
fill_na_qc (int): replace value for NA in QC flags
fill_na_error (float): replace value for NA in error variables
obs_seq_out (str): obs_seq file name
loose (bool): if True, store observation values also when
their QC and error are not present (default: False)

added in d667a00

Copy link
Contributor

@mjs2369 mjs2369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Convert lat and lon to radians
ddf = ddf.rename(columns={"LONGITUDE": "LONGITUDE_DEG", "LATITUDE": "LATITUDE_DEG"})

Move this comment down to where the lon and lat are actually converted to radians (lines 167-8)

ddf['LONGITUDE'] = np.deg2rad(ddf['LONGITUDE_DEG'])
ddf['LATITUDE'] = np.deg2rad(ddf['LATITUDE_DEG'])

Also doesn't line 156 change the name of this column from LONGITUDE to LONGITUDE_DEG (and the same for lat), so LONGITUDE and LATITUDE no longer exists as columns in the dataframe? Or does it create a duplicate columns in the dataframe with the new names?

Copy link
Contributor

@mjs2369 mjs2369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Convert pressure from dbar to Pascals
#ddf['PRES'] = ddf['PRES']*1e4

remove if not actually needed?

removed commented out conversion of pressure
@hkershaw-brown
Copy link
Member Author

# Convert pressure from dbar to Pascals
#ddf['PRES'] = ddf['PRES']*1e4

remove if not actually needed?

removed in 2b7e6dd

@hkershaw-brown
Copy link
Member Author

# Convert lat and lon to radians
ddf = ddf.rename(columns={"LONGITUDE": "LONGITUDE_DEG", "LATITUDE": "LATITUDE_DEG"})

Move this comment down to where the lon and lat are actually converted to radians (lines 167-8)

ddf['LONGITUDE'] = np.deg2rad(ddf['LONGITUDE_DEG'])
ddf['LATITUDE'] = np.deg2rad(ddf['LATITUDE_DEG'])

Also doesn't line 156 change the name of this column from LONGITUDE to LONGITUDE_DEG (and the same for lat), so LONGITUDE and LATITUDE no longer exists as columns in the dataframe? Or does it create a duplicate columns in the dataframe with the new names?

Moved comment in
#974 (review)

Copy link
Contributor

@mjs2369 mjs2369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge

@hkershaw-brown hkershaw-brown added the release! bundle with next release label Oct 2, 2025
@hkershaw-brown hkershaw-brown merged commit 1ddc21f into main Oct 2, 2025
4 checks passed
@hkershaw-brown hkershaw-brown deleted the crocolake branch October 2, 2025 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release! bundle with next release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crocolake observation converter

3 participants