forked from jcrobak/parquet-python
-
-
Notifications
You must be signed in to change notification settings - Fork 188
Open
Description
Describe the issue:
After saving with mode='append', i'm trying to read the file, but categorical data is incorrected
Minimal Complete Verifiable Example:
import pandas as pd
import numpy as np
import pyarrow
import fastparquet
import sys
print(np.__version__)
# 1.26.4
print(pd.__version__)
# 2.2.2
print(pyarrow.__version__)
# 15.0.2
print(fastparquet.__version__)
# 2024.5.0
print(sys.version)
# 3.9.19 (main, Jun 11 2024, 10:17:45)
# [GCC 11.4.0]
data = pd.DataFrame(np.arange(9).reshape(3, 3), columns=["first", "second", "thirst"])
data["thirst"] = data["thirst"].astype("category")
data.to_parquet("data.parquet", engine="fastparquet", index=False)
print(data)
# first second thirst
# 0 0 1 2
# 1 3 4 5
# 2 6 7 8
data_new = pd.read_parquet("data.parquet", engine="fastparquet")
print(data_new)
# first second thirst
# 0 0 1 2
# 1 3 4 5
# 2 6 7 8
data2 = pd.DataFrame(
np.arange(3, 12).reshape(3, 3), columns=["first", "second", "thirst"]
)
data2["thirst"] = data2["thirst"].astype("category")
data2.to_parquet("data.parquet", engine="fastparquet", append=True, index=False)
print(data2)
# first second thirst
# 0 3 4 5
# 1 6 7 8
# 2 9 10 11
data2_new = pd.read_parquet("data.parquet", engine="fastparquet")
print(data2_new)
# first second thirst
# 0 0 1 5
# 1 3 4 8
# 2 6 7 11
# 3 3 4 5
# 4 6 7 8
# 5 9 10 11
data2_new["thirst"] = data2_new["thirst"].astype("object")
print(data2_new)
# first second thirst
# 0 0 1 5
# 1 3 4 8
# 2 6 7 11
# 3 3 4 5
# 4 6 7 8
# 5 9 10 11
data2_pyarrow = pd.read_parquet("data.parquet", engine="pyarrow")
print(data2_pyarrow)
# first second thirst
# 0 0 1 2
# 1 3 4 5
# 2 6 7 8
# 3 3 4 5
# 4 6 7 8
# 5 9 10 11
Environment:
- Dask version:
- Python version: 3.9.19
- Operating System: Ubuntu 22.04
Metadata
Metadata
Assignees
Labels
No labels