Skip to content

Xarray dataset for particledata #2079

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Jul 16, 2025
Merged

Conversation

erikvansebille
Copy link
Member

@erikvansebille erikvansebille commented Jul 10, 2025

This PR swaps the custom ParticleData class to an xarray.Dataset.

Note that it also changed the time-handling in pet.execute() to fully use numpy.datetime64/numpy.timedelta64. Also note that this PR will not change the ParticleFile class, which will be a next PR

Other aspects that still need to be done

This is a pragmatic/temporary fix of #2078 so that Advection kernels can be tested; while we consider the API for particle.dt
erikvansebille and others added 9 commits July 14, 2025 08:38
Since we don't know on particleset initialisation whether the execute will be forward or backward in time, we can't yet decide whether the time should be time_interval.left or time_interval.right. Hence, setting to "NaT" until we know the sign of dt
Only ptype is needed for Kernel
For indexing (to be implemented)
@erikvansebille erikvansebille marked this pull request as ready for review July 15, 2025 08:34
@erikvansebille
Copy link
Member Author

I've now finished the draft for using xarray as particleset-data-structure under the hood.

Note that I've also implemented dumpy.datetime64 support inside the kernels, which is nice for user flexibility but likely also very slow and thus a performance-drain. I'd propose we keep this now, and then make another PR where we go back to floats inside the kernel, to explore how much that improves performance.

@VeckoTheGecko, could you help with adding Type-annotation/checking where relevant on the new functions/methods?

Copy link
Contributor

@VeckoTheGecko VeckoTheGecko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things that we need to work out at a conceptual level:

  • Particle is a bit overloaded in responsibility at the moment. It acts both as a thin wrapper (hooking into the underlying _data via the index in order to make modifications), as well as handles* adding variables to create custom particles.
    • This overloading of responsibility is a problem because the former is concerned about the particle itself in a loop, while the latter is about types of particles (which is declared before the particleset execution loop, and is then used to actually generate the underlying particle data class).
    • I think these should be split apart - not sure on naming though. Perhaps ParticleAccessor and DefaultParticle
  • Not sure if the usage of ._index in the Particle at the moment is robust to the deleting of particles

*this functionality looks to be untested at the moment, and the implementations of add_variable aren't yet updated

return p_string + f"time={time_string})"
def __init__(self, data, index=None):
self._data = data
self._index = index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is ._index? A particle ID or only a index in the dataset? (if the former, are we dealing with deleted Particles in the ParticleData class? Do we know how that would work?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ._index is needed for the getattr and setattr below. If you can think of a better way to do this, I'd be keen to hear!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now expanded the unit test to check how robust indexing is to particle deletion, and it appears to work well. See f06e024

Comment on lines +138 to +167
self._data = xr.Dataset(
{
"lon": (["trajectory"], lon.astype(lonlatdepth_dtype)),
"lat": (["trajectory"], lat.astype(lonlatdepth_dtype)),
"depth": (["trajectory"], depth.astype(lonlatdepth_dtype)),
"time": (["trajectory"], time),
"dt": (["trajectory"], np.timedelta64(1, "ns") * np.ones(len(trajectory_ids))),
"ei": (["trajectory", "ngrid"], np.zeros((len(trajectory_ids), len(fieldset.gridset)), dtype=np.int32)),
"state": (["trajectory"], np.zeros((len(trajectory_ids)), dtype=np.int32)),
"lon_nextloop": (["trajectory"], lon.astype(lonlatdepth_dtype)),
"lat_nextloop": (["trajectory"], lat.astype(lonlatdepth_dtype)),
"depth_nextloop": (["trajectory"], depth.astype(lonlatdepth_dtype)),
"time_nextloop": (["trajectory"], time),
},
coords={
"trajectory": ("trajectory", trajectory_ids),
},
attrs={
"ngrid": len(fieldset.gridset),
"ptype": pclass.getPType(),
},
)
# add extra fields from the custom Particle class
for v in pclass.__dict__.values():
if isinstance(v, Variable):
if isinstance(v.initial, attrgetter):
initial = v.initial(self).values
else:
initial = v.initial * np.ones(len(trajectory_ids), dtype=v.dtype)
self._data[v.name] = (["trajectory"], initial)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this dataset generation should be wrapped up into a method on the Particle class (so that this part of the code does not have to worry about custom particle classes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we (you?) do that in a new PR, once this one is merged?

@github-project-automation github-project-automation bot moved this from Backlog to Ready in Parcels development Jul 15, 2025
@erikvansebille erikvansebille merged commit b6cada7 into v4-dev Jul 16, 2025
9 checks passed
@erikvansebille erikvansebille deleted the xarray_dataset_for_particle_data branch July 16, 2025 11:59
@github-project-automation github-project-automation bot moved this from Ready to Done in Parcels development Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants