Performance : speeding up pickling of cftime arrays

I used cftime 1.4.1 and 1.5.0 when exploring this.

My worklfows involve large datasets and complex functions. I use xarray, backed-up by dask. In one of the more complex processing, I use xarray's `map_blocks` and a handful of other dask-lazy methods on a large dataset that uses the NoLeap calendar. The dataset is large with 950 chunks and a 55114-element time coordinate. It seems a lot of time is spent in pickling the latter.

More precisely, this line of dask : https://github.com/dask/dask/blob/1c4a84225d1bd26e58d716d2844190cc23ebcfec/dask/base.py#L1028 calls `pickle.dumps` on the numpy array of type O that stores the cftime.Datetime objects.

When profiling the graph creation (no computation triggered yet), I can see that this step is the one that takes the most time. Slightly more than another function in xarray's CFTimeIndex creation.

MWE:
```python3
import pickle
import numpy as np
import pandas as pd
import xarray as xr


cft = xr.cftime_range('1950-01-01', '2100-01-01', freq='D')  # cftime array
npy = pd.date_range('1950-01-01', '2100-01-01', freq='D')  # same shape but numpy's datetime objects
oar = np.array([1] * npy.size, dtype='O')  # sanity check, normal array of object dtype, but builtin element type
```
`timeit` calls in a notebook:
![Capture d’écran de 2021-08-10 11-10-49](https://user-images.githubusercontent.com/20629530/128892399-f56fa7ea-9f29-4680-bec0-8e03a3306ec3.png)

So even if it is normal that pickiling an object array is slower, the cftime array is still 2 orders of magnitude slower than a basic array.  I am not very knowledgeable in how `pickle` works, but I believe something could be made to speed this up. 

Any ideas?





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance : speeding up pickling of cftime arrays #253

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance : speeding up pickling of cftime arrays #253

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions