Skip to content

[Python] Add a utility function to create Arrow table instead of pandas df #47172

@AlenkaF

Description

@AlenkaF

Describe the enhancement requested

It would be good to have a utility function to create an Arrow table directly instead of having to go through pandas in some of out pyarrow tests. The existing utility function that uses pandas is:

def _test_dataframe(size=10000, seed=0):
import pandas as pd
np.random.seed(seed)
df = pd.DataFrame({
'uint8': _random_integers(size, np.uint8),
'uint16': _random_integers(size, np.uint16),
'uint32': _random_integers(size, np.uint32),
'uint64': _random_integers(size, np.uint64),
'int8': _random_integers(size, np.int8),
'int16': _random_integers(size, np.int16),
'int32': _random_integers(size, np.int32),
'int64': _random_integers(size, np.int64),
'float32': np.random.randn(size).astype(np.float32),
'float64': np.arange(size, dtype=np.float64),
'bool': np.random.randn(size) > 0,
'strings': [util.rands(10) for i in range(size)],
'all_none': [None] * size,
'all_none_category': [None] * size
})
# TODO(PARQUET-1015)
# df['all_none_category'] = df['all_none_category'].astype('category')
return df

This issue would move some of tests using _test_dataframe to use a new utility function and remove the @pytest.mark.pandas in this cases.

Component(s)

Python

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions