Skip to content

Commit

Permalink
✨ support Dataframe interchange protocol (#1509)
Browse files Browse the repository at this point in the history
* 🐛 handle offset for categories

* Draft for the Dataframe interchange protocol

* Adding test for virtual column plus typo.

* Roundtrip test change plus some corrections in functions parameters

* Apply suggestions from code review

* Dtype for arrow dict plus use of arrow dict in convert_categorical_column

* Add missing value handling

* Added chunk handling and tests

* Corrected usage of metadata for categories

* Applying changes from general dataframe protocol

* Delete copy error

* Change sentinel value handling in convert_categorical_column

* Add select_columns() and test

* Update to _get_data_buffer() for Arrow Dictionary

* Minor commenting changes

* Correct typo error

* Add _VaexBuffer test

* Add tests and correction for _VaexColumn

* Added tests for _VaexDataFrame

* Added more tests and one correction for format_str

* format to LF and black

* support passing in allow_copy

* correct descibe_null for arrow and numpy

* correct _get_validity_buffer to match describe_null

* correct describe_null, convert_categorical_column and test_categorical_ordinal for categorical dtypes

* Apply suggestions from code review

* correct get_chunks for _VaexDataFrame

* Replace return with yield in get_chunks

* Check for LF and run black with -l 220

* Black with line length 220

* Add string dtype support

* Add Arrow Dict check to describe_categorical

* avoid copying data for strings

* small fix

* also test sliced dataframe

* test that we do not copy data

* Apply string no-mem copy suggestions

* fix and test get_chunks

* use future ordinal encoding feature

* make test work with dict encoded

Co-authored-by: Maarten A. Breddels <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
  • Loading branch information
3 people authored Oct 13, 2021
1 parent b98ea66 commit d5410f8
Show file tree
Hide file tree
Showing 3 changed files with 1,100 additions and 0 deletions.
7 changes: 7 additions & 0 deletions packages/vaex-core/vaex/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
from .datatype import DataType
from .docstrings import docsubst


astropy = vaex.utils.optional_import("astropy.units")

# py2/p3 compatibility
Expand Down Expand Up @@ -246,6 +247,12 @@ def dep_filter(d : dict):
fp = vaex.cache.fingerprint(state, df.dataset.fingerprint)
return f'dataframe-{fp}'

def __dataframe__(self, nan_as_null : bool = False, allow_copy : bool = True):
"""
"""
import vaex.dataframe_protocol
return vaex.dataframe_protocol._VaexDataFrame(self, nan_as_null=nan_as_null, allow_copy=allow_copy)

def _future(self, version=5, inplace=False):
'''Act like a Vaex dataframe version 5.
Expand Down
Loading

0 comments on commit d5410f8

Please sign in to comment.