Xarray dataset for particledata #2079

erikvansebille · 2025-07-10T06:35:54Z

This PR swaps the custom ParticleData class to an xarray.Dataset.

Note that it also changed the time-handling in pet.execute() to fully use numpy.datetime64/numpy.timedelta64. Also note that this PR will not change the ParticleFile class, which will be a next PR

Other aspects that still need to be done

implement/test particle.delete
clean up Particle class
add support for custom Variables
fix the particles as time delta issue (Using our own timedelta class to support float*timedelta64 multiplication in Kernels? #2078)
bring in unit tests from test_kernel_language
implement particle.ei into the DataSet
Support negative dt for backward-in-time simulations
Implement Type-annotations and -checking in new functions and methods

Chose the correct base branch (main for v3 changes, v4-dev for v4 changes)
Fixes Refactor ParticleData internals to use an xr.Dataset #1822
Added tests

Using temporary TestParticle class for now

This is a pragmatic/temporary fix of #2078 so that Advection kernels can be tested; while we consider the API for particle.dt

Fixing c002294

and fixing setting lonlatdepth_dtype

and adding unit test

And moving from temporary TestParticle to original Particle class

Since we don't know on particleset initialisation whether the execute will be forward or backward in time, we can't yet decide whether the time should be time_interval.left or time_interval.right. Hence, setting to "NaT" until we know the sign of dt

Only ptype is needed for Kernel

For indexing (to be implemented)

erikvansebille · 2025-07-15T08:37:37Z

I've now finished the draft for using xarray as particleset-data-structure under the hood.

Note that I've also implemented dumpy.datetime64 support inside the kernels, which is nice for user flexibility but likely also very slow and thus a performance-drain. I'd propose we keep this now, and then make another PR where we go back to floats inside the kernel, to explore how much that improves performance.

@VeckoTheGecko, could you help with adding Type-annotation/checking where relevant on the new functions/methods?

VeckoTheGecko

A few things that we need to work out at a conceptual level:

Particle is a bit overloaded in responsibility at the moment. It acts both as a thin wrapper (hooking into the underlying _data via the index in order to make modifications), as well as handles* adding variables to create custom particles.
- This overloading of responsibility is a problem because the former is concerned about the particle itself in a loop, while the latter is about types of particles (which is declared before the particleset execution loop, and is then used to actually generate the underlying particle data class).
- I think these should be split apart - not sure on naming though. Perhaps ParticleAccessor and DefaultParticle
Not sure if the usage of ._index in the Particle at the moment is robust to the deleting of particles

*this functionality looks to be untested at the moment, and the implementations of add_variable aren't yet updated

parcels/particle.py

VeckoTheGecko · 2025-07-15T09:15:27Z

parcels/particle.py

-        return p_string + f"time={time_string})"
+    def __init__(self, data, index=None):
+        self._data = data
+        self._index = index


What is ._index? A particle ID or only a index in the dataset? (if the former, are we dealing with deleted Particles in the ParticleData class? Do we know how that would work?)

The ._index is needed for the getattr and setattr below. If you can think of a better way to do this, I'd be keen to hear!

I've now expanded the unit test to check how robust indexing is to particle deletion, and it appears to work well. See f06e024

parcels/particle.py

parcels/application_kernels/interpolation.py

parcels/kernel.py

VeckoTheGecko · 2025-07-15T11:08:11Z

parcels/particleset.py

+        self._data = xr.Dataset(
+            {
+                "lon": (["trajectory"], lon.astype(lonlatdepth_dtype)),
+                "lat": (["trajectory"], lat.astype(lonlatdepth_dtype)),
+                "depth": (["trajectory"], depth.astype(lonlatdepth_dtype)),
+                "time": (["trajectory"], time),
+                "dt": (["trajectory"], np.timedelta64(1, "ns") * np.ones(len(trajectory_ids))),
+                "ei": (["trajectory", "ngrid"], np.zeros((len(trajectory_ids), len(fieldset.gridset)), dtype=np.int32)),
+                "state": (["trajectory"], np.zeros((len(trajectory_ids)), dtype=np.int32)),
+                "lon_nextloop": (["trajectory"], lon.astype(lonlatdepth_dtype)),
+                "lat_nextloop": (["trajectory"], lat.astype(lonlatdepth_dtype)),
+                "depth_nextloop": (["trajectory"], depth.astype(lonlatdepth_dtype)),
+                "time_nextloop": (["trajectory"], time),
+            },
+            coords={
+                "trajectory": ("trajectory", trajectory_ids),
+            },
+            attrs={
+                "ngrid": len(fieldset.gridset),
+                "ptype": pclass.getPType(),
+            },
        )
+        # add extra fields from the custom Particle class
+        for v in pclass.__dict__.values():
+            if isinstance(v, Variable):
+                if isinstance(v.initial, attrgetter):
+                    initial = v.initial(self).values
+                else:
+                    initial = v.initial * np.ones(len(trajectory_ids), dtype=v.dtype)
+                self._data[v.name] = (["trajectory"], initial)


I think that this dataset generation should be wrapped up into a method on the Particle class (so that this part of the code does not have to worry about custom particle classes).

I propose we (you?) do that in a new PR, once this one is merged?

parcels/particleset.py

tests/v4/test_kernel.py

Following reviewer comment

@VeckoTheGecko

@VeckoTheGecko, you were right that this if-statement wasn't needed

erikvansebille added 13 commits July 7, 2025 18:17

Very first attempt of using an xaray dataset to hold particle data

1ad419a

First run through execute for xarray pset

1fdd767

Simplifying Kernel time comparison

b3c5d49

Support progressbar with datimtime, and clean up docstring

aee0a49

Merge branch 'remove_time_origin' into xarray_dataset_for_particle_data

51ec275

use numpy.datetime/numpy.timedelta in test_particleset

6a9304a

Adding unit tests for particle.time and execute dt, runtime and endtime

2b0ec4a

Delete particledata.py

5459e0c

Support adding particlesets

53483b8

Changing public particleset.data to private _data

e67eb82

Adding an iterator for xarray particleset

cbd9732

Implement getattr for Particles

0e4eb65

Using temporary TestParticle class for now

small fixes

e277a55

github-project-automation bot added this to Parcels development Jul 10, 2025

github-project-automation bot moved this to Backlog in Parcels development Jul 10, 2025

erikvansebille marked this pull request as draft July 10, 2025 06:36

erikvansebille added 2 commits July 11, 2025 08:34

Fixing that UXPiecewiseConstantFace returns float instead of list

b6f650d

Converting timedelta64 to float for dt in Advection Kernels

be87206

This is a pragmatic/temporary fix of #2078 so that Advection kernels can be tested; while we consider the API for particle.dt

erikvansebille mentioned this pull request Jul 11, 2025

Using our own timedelta class to support float*timedelta64 multiplication in Kernels? #2078

Open

erikvansebille added 11 commits July 11, 2025 08:54

Fixing bug when creating empty pset

eb64fc6

Direct import numpy as np for all Kernels

c002294

Adding test for stopping simulation

79c20db

Adding import numpy as np to kernels

c96997c

Fixing c002294

Fixing error_particles test

532f912

Adding tests for particleset creation

be14654

and fixing setting lonlatdepth_dtype

Adding test for creating particles outside time

6aaf85b

Fixing execute loop for irregular dt

fff19b3

and adding unit test

Adding Support for Custom Variables

c33b049

And moving from temporary TestParticle to original Particle class

Removing lastID variable, and adding more tests

f2d2ea4

Supporting particle.delete in Kernel

0dfb97f

erikvansebille and others added 9 commits July 14, 2025 08:38

Using particle.dt in advection kernel Field evaluations

7c47912

Adding support for negative dt (for backward tracking)

97adb3b

Changing pid_orig to trajectory_id in pset init

6318821

Merge branch 'v4-dev' into xarray_dataset_for_particle_data

26abb3a

Removing pclass from ParticleSet object

5a3d0e1

Only ptype is needed for Kernel

Adding default ParticleSet.ei Variable

ff78b3a

For indexing (to be implemented)

Cleaning up legacy methods in Particle class

b6fa426

Adding more tests

180e9a5

erikvansebille marked this pull request as ready for review July 15, 2025 08:34

erikvansebille requested review from VeckoTheGecko and fluidnumerics-joe July 15, 2025 08:34

VeckoTheGecko requested changes Jul 15, 2025

View reviewed changes

github-project-automation bot moved this from Backlog to Ready in Parcels development Jul 15, 2025

erikvansebille added 6 commits July 16, 2025 08:11

Removing tests from v3 that have been moved to v4

00569f1

Moving more tests from v3 to v4

4a46f24

Implementing reviewer comments

d0c5537

Updating return value of unknown variables

bb41d42

Following reviewer comment

Adding typechecking for Particle

52e954f

Forcing Particle.add_variable to expect a Variable or list of Variables

708b825

erikvansebille requested a review from VeckoTheGecko July 16, 2025 08:54

erikvansebille and others added 4 commits July 16, 2025 11:07

Expanding unit test to check for Particle._index behaviour

f06e024

Simplifying Particle.__getattr__

0d5c085

@VeckoTheGecko, you were right that this if-statement wasn't needed

Further simplifying particle.__getattr__

8fc2b06

Merge branch 'v4-dev' into xarray_dataset_for_particle_data

7e3b9ad

erikvansebille merged commit b6cada7 into v4-dev Jul 16, 2025
9 checks passed

erikvansebille deleted the xarray_dataset_for_particle_data branch July 16, 2025 11:59

github-project-automation bot moved this from Ready to Done in Parcels development Jul 16, 2025

VeckoTheGecko mentioned this pull request Jul 16, 2025

Bugfix unstructured grid search face index #2087

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Xarray dataset for particledata #2079

Xarray dataset for particledata #2079

Uh oh!

erikvansebille commented Jul 10, 2025 •

edited

Loading

Uh oh!

erikvansebille commented Jul 15, 2025

Uh oh!

VeckoTheGecko left a comment

Uh oh!

Uh oh!

VeckoTheGecko Jul 15, 2025

Uh oh!

erikvansebille Jul 16, 2025

Uh oh!

erikvansebille Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VeckoTheGecko Jul 15, 2025

Uh oh!

erikvansebille Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xarray dataset for particledata #2079

Xarray dataset for particledata #2079

Uh oh!

Conversation

erikvansebille commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erikvansebille commented Jul 15, 2025

Uh oh!

VeckoTheGecko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

VeckoTheGecko Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

erikvansebille Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

erikvansebille Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VeckoTheGecko Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

erikvansebille Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erikvansebille commented Jul 10, 2025 •

edited

Loading