Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve names in grid roundtrip. #66

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

groutr
Copy link
Contributor

@groutr groutr commented Apr 1, 2021

When a grid is loaded and then saved with gridded, all of the existing names of variables and dimensions get overwritten with generated names.

This PR introduces a strategy to preserve those names by recording them in a dictionary structure. This is still a WIP, but comes from #65 discussion.

Some of the simplified, but incredibly useful ways to use this mapping:

# The mesh_1d object was generated from 1d object http://ugrid-conventions.github.io/ugrid-conventions/#1d-network-topology
# If we want to know what an input name maps to in terms of the spec:
mesh_1d.get_attribute("Mesh1_edge_y")   # Returns 'edge_coordinates'
# If we want to look up what an element of the spec maps to in the input netcdf
mesh_1d['edge_coordinates']  # Returns ['Mesh1_edge_x', 'Mesh1_edge_y']

# If we want to get the values of the edge coordinates of the mesh
mesh_1d.get_values("edge_coordinates", grid_1d)  # Returns grid_1d.edge_coordinates
mesh_1d.get_values("Mesh1_edge_y", grid_1d)  # Returns grid_1d.edge_coordinates[:,1]

@@ -237,6 +237,8 @@ def num_vertices(self):
def nodes(self):
return self._nodes

node_coordinates = nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at the rest of the code, but I'd rather not add aliases for existing names -- and we really don't want to change the names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisBarker-NOAA see my explanation here: #65 (comment)

@ChrisBarker-NOAA
Copy link
Contributor

As this is a WIP, I may be missing something, but my thoughts as to where you are going:

The UGrid object is all about capturing the data model inherent in the UGRID spec. However, it is intended to be independent of netcdf itself -- able to be created, used, saved with no files, or other file formats, or ....
But the code as it stands is a bit entangled with netCDF, and I've really been meaning to refactor the netcdf IO code.

I was thinking that this approach was overdoing it a bit -- re-implementing what is in the UGrid object already (or including stuff that is inherent in the data model). But if we think of it as taking everything that is specific to netCDF (dimensions, for instance) and putting that in a separate class (or set of classes) then this does start to make more sense.

So I want to see where this is going -- how do you use this to load or save a UGrid object?

A few goals to keep in mind:

  1. As the PR name says, we want a netCDF file to round-trip through UGRid with little (or no) changes -- i.e. preserving the variable names. so that's one goal.

  2. You should be able to create a UGrid object from "scratch", and then save it out to netCDF, without having to specify anything extra (i.e. all variable and dimension names should be optionally auto-generated.

  3. remember that there could be more than one mesh in a single netcdf file -- at least in theory. this is not the lest bit well tested, but good to keep in mind.

  4. My idea for refactoring of the loading from netcdf code was to make it a two-step process:
    a) examine the netCDF file, and figure out what all the variables mean
    b) actually load the UGrid from the file

The idea here is that if you have a non-compliant file, you can do step (a) by hand (or some other way). This would require an intermediate representation of the mapping between variable names and UGRid "parts" -- so your approach here might work really well. The trick, however is that there might need to be some processing in there somehow (if a part of the grid is represented in another way -- i.e. more needs to be done than to specify the variable names.

Overall design philosophy: I agree with the "zen"'s axiom: "flat is better than nested" -- so keep that in mind. For instance there may not be a need for a Dimension class -- it really doesn't hold much -- just a thought to keep in mind.

Side note: We may want to, sooner than later, use xarray as the interface to netcdf, and other file formats. xarray matches the netcdf data model, but there may be some differences to keep in mind. If you want, you could go to xarray first. (that might actually make it less disruptive -- it would be all in the "xarray" loader/saver, leaving netCDF untouched :-)

Final point -- I think we can go all Python3 at this point >= 3.8 seems reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants