Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft for the Dataframe interchange protocol #1509

Merged
merged 40 commits into from
Oct 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
b3b5477
🐛 handle offset for categories
maartenbreddels Oct 11, 2021
d91f1c4
Draft for the Dataframe interchange protocol
AlenkaF Aug 12, 2021
a86a1df
Adding test for virtual column plus typo.
AlenkaF Aug 12, 2021
c74d6dd
Roundtrip test change plus some corrections in functions parameters
AlenkaF Aug 16, 2021
4da374d
Apply suggestions from code review
AlenkaF Aug 16, 2021
c76477d
Dtype for arrow dict plus use of arrow dict in convert_categorical_co…
AlenkaF Aug 19, 2021
24e9774
Add missing value handling
AlenkaF Aug 24, 2021
53b291c
Added chunk handling and tests
AlenkaF Aug 31, 2021
e4f39cd
Corrected usage of metadata for categories
AlenkaF Sep 2, 2021
6d73730
Applying changes from general dataframe protocol
AlenkaF Sep 2, 2021
6a7d5d2
Delete copy error
AlenkaF Sep 3, 2021
1d36a6e
Change sentinel value handling in convert_categorical_column
AlenkaF Sep 3, 2021
3b858d9
Add select_columns() and test
AlenkaF Sep 3, 2021
e764622
Update to _get_data_buffer() for Arrow Dictionary
AlenkaF Sep 3, 2021
615545a
Minor commenting changes
AlenkaF Sep 3, 2021
88166ad
Correct typo error
AlenkaF Sep 6, 2021
0f4813a
Add _VaexBuffer test
AlenkaF Sep 6, 2021
04d8b6c
Add tests and correction for _VaexColumn
AlenkaF Sep 6, 2021
1ca7277
Added tests for _VaexDataFrame
AlenkaF Sep 6, 2021
811a952
Added more tests and one correction for format_str
AlenkaF Sep 7, 2021
6726d55
format to LF and black
maartenbreddels Sep 16, 2021
77650a2
support passing in allow_copy
maartenbreddels Sep 16, 2021
64f12a4
correct descibe_null for arrow and numpy
maartenbreddels Sep 16, 2021
bf7ebb0
correct _get_validity_buffer to match describe_null
AlenkaF Sep 17, 2021
5126959
correct describe_null, convert_categorical_column and test_categorica…
AlenkaF Sep 17, 2021
e2ad32b
Apply suggestions from code review
AlenkaF Sep 17, 2021
14d69cc
correct get_chunks for _VaexDataFrame
AlenkaF Sep 17, 2021
a074474
Replace return with yield in get_chunks
AlenkaF Sep 17, 2021
e53a390
Check for LF and run black with -l 220
AlenkaF Sep 17, 2021
a61ad1e
Black with line length 220
AlenkaF Sep 20, 2021
43980a8
Add string dtype support
AlenkaF Sep 20, 2021
f95692b
Add Arrow Dict check to describe_categorical
AlenkaF Sep 20, 2021
f5bf82c
avoid copying data for strings
maartenbreddels Sep 27, 2021
1879472
small fix
maartenbreddels Sep 27, 2021
7060266
also test sliced dataframe
maartenbreddels Sep 27, 2021
8e988f8
test that we do not copy data
maartenbreddels Sep 27, 2021
bc7ed73
Apply string no-mem copy suggestions
Oct 5, 2021
e14fba5
fix and test get_chunks
maartenbreddels Oct 6, 2021
51b2c06
use future ordinal encoding feature
maartenbreddels Oct 11, 2021
0df4b34
make test work with dict encoded
maartenbreddels Oct 12, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions packages/vaex-core/vaex/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
from .datatype import DataType
from .docstrings import docsubst


astropy = vaex.utils.optional_import("astropy.units")

# py2/p3 compatibility
Expand Down Expand Up @@ -246,6 +247,12 @@ def dep_filter(d : dict):
fp = vaex.cache.fingerprint(state, df.dataset.fingerprint)
return f'dataframe-{fp}'

def __dataframe__(self, nan_as_null : bool = False, allow_copy : bool = True):
"""
"""
import vaex.dataframe_protocol
return vaex.dataframe_protocol._VaexDataFrame(self, nan_as_null=nan_as_null, allow_copy=allow_copy)

def _future(self, version=5, inplace=False):
'''Act like a Vaex dataframe version 5.

Expand Down Expand Up @@ -5488,6 +5495,9 @@ def _auto_encode_data(self, expression, values):
return values
if self.is_category(expression):
dictionary = vaex.array_types.to_arrow(self.category_labels(expression))
offset = self.category_offset(expression)
if offset != 0:
values = values - offset
values = vaex.array_types.to_arrow(values)
to_type = None
if values.type in self._dict_mapping:
Expand Down
Loading