-
Notifications
You must be signed in to change notification settings - Fork 148
Xarray dataset for particledata #2079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Using temporary TestParticle class for now
This is a pragmatic/temporary fix of #2078 so that Advection kernels can be tested; while we consider the API for particle.dt
and fixing setting lonlatdepth_dtype
and adding unit test
And moving from temporary TestParticle to original Particle class
Since we don't know on particleset initialisation whether the execute will be forward or backward in time, we can't yet decide whether the time should be time_interval.left or time_interval.right. Hence, setting to "NaT" until we know the sign of dt
Only ptype is needed for Kernel
For indexing (to be implemented)
I've now finished the draft for using xarray as particleset-data-structure under the hood. Note that I've also implemented @VeckoTheGecko, could you help with adding Type-annotation/checking where relevant on the new functions/methods? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things that we need to work out at a conceptual level:
Particle
is a bit overloaded in responsibility at the moment. It acts both as a thin wrapper (hooking into the underlying_data
via the index in order to make modifications), as well as handles* adding variables to create custom particles.- This overloading of responsibility is a problem because the former is concerned about the particle itself in a loop, while the latter is about types of particles (which is declared before the particleset execution loop, and is then used to actually generate the underlying particle data class).
- I think these should be split apart - not sure on naming though. Perhaps
ParticleAccessor
andDefaultParticle
- Not sure if the usage of
._index
in theParticle
at the moment is robust to the deleting of particles
*this functionality looks to be untested at the moment, and the implementations of add_variable
aren't yet updated
return p_string + f"time={time_string})" | ||
def __init__(self, data, index=None): | ||
self._data = data | ||
self._index = index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is ._index
? A particle ID or only a index in the dataset? (if the former, are we dealing with deleted Particles in the ParticleData class? Do we know how that would work?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ._index
is needed for the getattr and setattr below. If you can think of a better way to do this, I'd be keen to hear!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now expanded the unit test to check how robust indexing is to particle deletion, and it appears to work well. See f06e024
self._data = xr.Dataset( | ||
{ | ||
"lon": (["trajectory"], lon.astype(lonlatdepth_dtype)), | ||
"lat": (["trajectory"], lat.astype(lonlatdepth_dtype)), | ||
"depth": (["trajectory"], depth.astype(lonlatdepth_dtype)), | ||
"time": (["trajectory"], time), | ||
"dt": (["trajectory"], np.timedelta64(1, "ns") * np.ones(len(trajectory_ids))), | ||
"ei": (["trajectory", "ngrid"], np.zeros((len(trajectory_ids), len(fieldset.gridset)), dtype=np.int32)), | ||
"state": (["trajectory"], np.zeros((len(trajectory_ids)), dtype=np.int32)), | ||
"lon_nextloop": (["trajectory"], lon.astype(lonlatdepth_dtype)), | ||
"lat_nextloop": (["trajectory"], lat.astype(lonlatdepth_dtype)), | ||
"depth_nextloop": (["trajectory"], depth.astype(lonlatdepth_dtype)), | ||
"time_nextloop": (["trajectory"], time), | ||
}, | ||
coords={ | ||
"trajectory": ("trajectory", trajectory_ids), | ||
}, | ||
attrs={ | ||
"ngrid": len(fieldset.gridset), | ||
"ptype": pclass.getPType(), | ||
}, | ||
) | ||
# add extra fields from the custom Particle class | ||
for v in pclass.__dict__.values(): | ||
if isinstance(v, Variable): | ||
if isinstance(v.initial, attrgetter): | ||
initial = v.initial(self).values | ||
else: | ||
initial = v.initial * np.ones(len(trajectory_ids), dtype=v.dtype) | ||
self._data[v.name] = (["trajectory"], initial) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this dataset generation should be wrapped up into a method on the Particle class (so that this part of the code does not have to worry about custom particle classes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose we (you?) do that in a new PR, once this one is merged?
Following reviewer comment
@VeckoTheGecko, you were right that this if-statement wasn't needed
This PR swaps the custom ParticleData class to an
xarray.Dataset
.Note that it also changed the time-handling in
pet.execute()
to fully usenumpy.datetime64/numpy.timedelta64
. Also note that this PR will not change theParticleFile
class, which will be a next PROther aspects that still need to be done
particle.delete
Particle
classVariables
particle.ei
into the DataSetdt
for backward-in-time simulationsmain
for v3 changes,v4-dev
for v4 changes)