Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas integration does not symmetrically store and load with feather format #44

Open
EternalDeiwos opened this issue Mar 7, 2024 · 2 comments

Comments

@EternalDeiwos
Copy link

EternalDeiwos commented Mar 7, 2024

I am playing around with the geoarrow.pandas integration and found something odd; if I load a data frame containing a geometry column it will successfully load and display the geometry correctly but I am unable able to do anything with it. Anything I try (e.g. df.geometry.geoarrow.*) produces the following error:

TypeError: Can't create geoarrow.array from Arrow array of type None

I created the file like this:

import geoarrow.pyarrow as ga
import geoarrow.pandas as _
import pandas as pd
import numpy as np

points = np.random.rand((1 << 20, 2))

df = pd.DataFrame({
    "geometry": ga.point().from_geobuffers(
        None,
        points[:, 0],
        points[:, 1]
    )
})

df.to_feather('points.feather')

and I load the file like this

import geoarrow.pyarrow as ga
import geoarrow.pandas as _
import pandas as pd

df = pd.read_feather("points.feather")

# Example operations that produce the above error
df.astype({ 'geometry': 'geoarrow.wkt' })
x, y = df.geometry.geoarrow.point_coords()
# etc.
@EternalDeiwos EternalDeiwos changed the title Pandas integration does not symetrically store and load with feather format Pandas integration does not symmetrically store and load with feather format Mar 7, 2024
@paleolimbot
Copy link
Contributor

Good catch! I haven't opened up the pandas integration project for a while and it may be that some of my assumptions when I wrote the initial version are no longer valid! Other than general time constraints, one of the reasons I haven't put much effort into this part of the repo is that GeoPandas is considering allowing a GeoArrow storage type along these lines, and if that's the case, I'd want geoarrow-pyarrow to just return GeoPandas objects.

(In the meantime I should definitely fix though!)

@EternalDeiwos
Copy link
Author

Thanks. As I said I am just playing with it so no pressure from my side if this is going to change substantially in the future.

From my first impression, it is a lot easier to understand at a glance what geoarrow.pandas is doing under the hood than the equivalent GeoPandas. I hope wherever this lands, that it will be just as easy to directly access the underlying buffers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants