Open
Description
First of all I would like to thank you for you awesome contributions. During my development I came across the following issue.
Description
When trying to append to existing DeltaTable
the following error occurs:
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 3 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x1384dcc0ad0>\n 0. 1341334790528\n 1. finalize-02082eb4-e53c-4b1a-83dc-fb753d3f60dc\n 2. _commit-94f97f6c-675a-47c6-88a7-82b1a5234034\n>')
Reproducible Example
import pandas as pd
import dask.dataframe as dd
import dask_deltatable as ddt
from distributed import Client
from deltalake import DeltaTable
output_table = "./animals"
if __name__ == "__main__":
client = Client()
print(f"Dask Client: {client}")
animals_df = pd.DataFrame(
{
"name": ["dog", "cat", "whale", "elephant"],
"life_span": [13, 15, 90, 70],
},
)
animals_ddf = dd.from_pandas(animals_df)
animals_ddf["high_longevity"] = animals_ddf["life_span"] > 40
ddt.to_deltalake(
table_or_uri=output_table,
df=animals_ddf,
compute=True,
mode="append",
)
delta_table = DeltaTable(output_table)
delta_table_df = delta_table.to_pandas()
print("Created DeltaTable:")
print(delta_table_df)
more_animals_df = pd.DataFrame(
{
"name": ["shark", "parrot"],
"life_span": [20, 50],
},
)
more_animals_ddf = dd.from_pandas(more_animals_df)
more_animals_ddf["high_longevity"] = more_animals_ddf["life_span"] > 40
ddt.to_deltalake(
table_or_uri=output_table,
df=more_animals_ddf,
compute=True,
mode="append",
)
Stacktrace
Dask Client: <Client: 'tcp://127.0.0.1:60355' processes=4 threads=12, memory=31.90 GiB>
Created DeltaTable:
name life_span high_longevity
0 dog 13 False
1 cat 15 False
2 whale 90 True
3 elephant 70 True
2024-12-16 13:58:42,783 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 3 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x1e7f9b8b530>
0. 2095838501424
1. finalize-54be334c-3207-4a95-8908-1aac80f5edb6
2. _commit-2c7fae99-a722-4a11-8b99-ca2120ebbb4d
>.
Traceback (most recent call last):
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 60, in dumps
result = pickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 65, in dumps
pickler.dump(x)
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 77, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\cloudpickle\cloudpickle.py", line 1529, in dumps
cp.dump(obj)
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\cloudpickle\cloudpickle.py", line 1295, in dump
return super().dump(obj)
^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object
Traceback (most recent call last):
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 60, in dumps
result = pickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 65, in dumps
pickler.dump(x)
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\serialize.py", line 366, in serialize
header, frames = dumps(x, context=context) if wants_context else dumps(x)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\serialize.py", line 78, in pickle_dumps
frames[0] = pickle.dumps(
^^^^^^^^^^^^^
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\pickle.py", line 77, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\cloudpickle\cloudpickle.py", line 1529, in dumps
cp.dump(obj)
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\cloudpickle\cloudpickle.py", line 1295, in dump
return super().dump(obj)
^^^^^^^^^^^^^^^^^
TypeError: cannot pickle 'deltalake._internal.RawDeltaTable' object
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\thodo\Documents\libra\myenv\append_issue.py", line 45, in <module>
ddt.to_deltalake(
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\dask_deltatable\write.py", line 239, in to_deltalake
result = result.compute()
^^^^^^^^^^^^^^^^
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\dask\base.py", line 372, in compute
(result,) = compute(self, traverse=False, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\dask\base.py", line 660, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\thodo\miniconda3\envs\myenv\Lib\site-packages\distributed\protocol\serialize.py", line 392, in serialize
raise TypeError(msg, str_x) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 3 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x1e7f9b8b530>\n 0. 2095838501424\n 1. finalize-54be334c-3207-4a95-8908-1aac80f5edb6\n 2. _commit-2c7fae99-a722-4a11-8b99-ca2120ebbb4d\n>')
Library Versions
dask==2024.11.2
dask-deltatable==0.3.3
deltalake==0.22.3
distributed==2024.11.2
pandas==2.2.3
Metadata
Metadata
Assignees
Labels
No labels