Preserve names in grid roundtrip. #66

groutr · 2021-04-01T01:26:26Z

When a grid is loaded and then saved with gridded, all of the existing names of variables and dimensions get overwritten with generated names.

This PR introduces a strategy to preserve those names by recording them in a dictionary structure. This is still a WIP, but comes from #65 discussion.

Some of the simplified, but incredibly useful ways to use this mapping:

# The mesh_1d object was generated from 1d object http://ugrid-conventions.github.io/ugrid-conventions/#1d-network-topology
# If we want to know what an input name maps to in terms of the spec:
mesh_1d.get_attribute("Mesh1_edge_y")   # Returns 'edge_coordinates'
# If we want to look up what an element of the spec maps to in the input netcdf
mesh_1d['edge_coordinates']  # Returns ['Mesh1_edge_x', 'Mesh1_edge_y']

# If we want to get the values of the edge coordinates of the mesh
mesh_1d.get_values("edge_coordinates", grid_1d)  # Returns grid_1d.edge_coordinates
mesh_1d.get_values("Mesh1_edge_y", grid_1d)  # Returns grid_1d.edge_coordinates[:,1]

ChrisBarker-NOAA · 2021-04-01T17:24:07Z

gridded/pyugrid/ugrid.py

@@ -237,6 +237,8 @@ def num_vertices(self):
    def nodes(self):
        return self._nodes

+    node_coordinates = nodes


I'll look at the rest of the code, but I'd rather not add aliases for existing names -- and we really don't want to change the names.

@ChrisBarker-NOAA see my explanation here: #65 (comment)

ChrisBarker-NOAA · 2021-04-01T20:36:33Z

As this is a WIP, I may be missing something, but my thoughts as to where you are going:

The UGrid object is all about capturing the data model inherent in the UGRID spec. However, it is intended to be independent of netcdf itself -- able to be created, used, saved with no files, or other file formats, or ....
But the code as it stands is a bit entangled with netCDF, and I've really been meaning to refactor the netcdf IO code.

I was thinking that this approach was overdoing it a bit -- re-implementing what is in the UGrid object already (or including stuff that is inherent in the data model). But if we think of it as taking everything that is specific to netCDF (dimensions, for instance) and putting that in a separate class (or set of classes) then this does start to make more sense.

So I want to see where this is going -- how do you use this to load or save a UGrid object?

A few goals to keep in mind:

As the PR name says, we want a netCDF file to round-trip through UGRid with little (or no) changes -- i.e. preserving the variable names. so that's one goal.
You should be able to create a UGrid object from "scratch", and then save it out to netCDF, without having to specify anything extra (i.e. all variable and dimension names should be optionally auto-generated.
remember that there could be more than one mesh in a single netcdf file -- at least in theory. this is not the lest bit well tested, but good to keep in mind.
My idea for refactoring of the loading from netcdf code was to make it a two-step process:
a) examine the netCDF file, and figure out what all the variables mean
b) actually load the UGrid from the file

The idea here is that if you have a non-compliant file, you can do step (a) by hand (or some other way). This would require an intermediate representation of the mapping between variable names and UGRid "parts" -- so your approach here might work really well. The trick, however is that there might need to be some processing in there somehow (if a part of the grid is represented in another way -- i.e. more needs to be done than to specify the variable names.

Overall design philosophy: I agree with the "zen"'s axiom: "flat is better than nested" -- so keep that in mind. For instance there may not be a need for a Dimension class -- it really doesn't hold much -- just a thought to keep in mind.

Side note: We may want to, sooner than later, use xarray as the interface to netcdf, and other file formats. xarray matches the netcdf data model, but there may be some differences to keep in mind. If you want, you could go to xarray first. (that might actually make it less disruptive -- it would be all in the "xarray" loader/saver, leaving netCDF untouched :-)

Final point -- I think we can go all Python3 at this point >= 3.8 seems reasonable.

groutr added 5 commits March 30, 2021 08:43

Add mesh schema objects.

52c1809

Reuse dimension and variable mappings.

8070935

Init dictionary with values.

35f6990

Updates to Mesh classes.

e1e277d

Add node_coordinates alias

da8111e

ChrisBarker-NOAA reviewed Apr 1, 2021

View reviewed changes

factor out key filter

a6a5e02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve names in grid roundtrip. #66

Preserve names in grid roundtrip. #66

groutr commented Apr 1, 2021

ChrisBarker-NOAA Apr 1, 2021

groutr Apr 1, 2021

ChrisBarker-NOAA commented Apr 1, 2021

Preserve names in grid roundtrip. #66

Are you sure you want to change the base?

Preserve names in grid roundtrip. #66

Conversation

groutr commented Apr 1, 2021

ChrisBarker-NOAA Apr 1, 2021

Choose a reason for hiding this comment

groutr Apr 1, 2021

Choose a reason for hiding this comment

ChrisBarker-NOAA commented Apr 1, 2021