04 Aug 19:05

davesque

b58eb3f

v0.6.0

v0.6.0 Aug 4, 2021

Fixes
- Fix bug in _infer_datetime_format with all np.nan input (#1089)
Changes
- The criteria for categorical type inference have changed (#1065)
- The meaning of both the categorical_threshold and numeric_categorical_threshold settings have changed (#1065)
- Make sampling for type inference more consistent (#1083)
- Accessor logic checking if Woodwork has been initialized moved to decorator (#1093)
Documentation Changes
- Fix some release notes that ended up under the wrong release (#1082)
- Add BooleanNullable and IntegerNullable types to the docs (#1085)
- Add guide for saving and loading Woodwork DataFrames (#1066)
- Add in-depth guide on logical types and semantic tags (#1086)
Testing Changes
- Add additional reviewers to minimum and latest dependency checkers (#1070, #1073, #1077)
- Update the sample_df fixture to have more logical_type coverage (#1058)

Thanks to the following people for contributing to this release:
@davesque, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999

Breaking Changes

#1065: The criteria for categorical type inference have changed. Relatedly, the meaning of both the categorical_threshold and numeric_categorical_threshold settings have changed. Now, a categorical match is signaled when a series either has the "categorical" pandas dtype or if the ratio of unique value count (nan excluded) and total value count (nan also excluded) is below or equal to some fraction. The value used for this fraction is set by the categorical_threshold setting which now has a default value of 0.2. If a fraction is set for the numeric_categorical_threshold setting, then series with either a float or integer dtype may be inferred as categorical by applying the same logic described above with the numeric_categorical_threshold fraction. Otherwise, the numeric_categorical_threshold setting defaults to None which indicates that series with a numerical type should not be inferred as categorical. Users who have overridden either the categorical_threshold or numeric_categorical_threshold settings will need to adjust their settings accordingly.
#1083: The process of sampling series for logical type inference was updated to be more consistent. Before, initial sampling for inference differed depending on collection type (pandas, dask, or koalas). Also, further randomized subsampling was performed in some cases during categorical inference and in every case during email inference regardless of collection type. Overall, the way sampling was done was inconsistent and unpredictable. Now, the first 100,000 records of a column are sampled for logical type inference regardless of collection type although only records from the first partition of a dask dataset will be used. Subsampling performed by the inference functions of individual types has been removed. The effect of these changes is that inferred types may now be different although in many cases they will be more correct.

Contributors

davesque, thehomebrewnerd, and 5 other contributors

Assets 2

22 Jul 19:48

simha104

v0.5.1

92d2971

v0.5.1

v0.5.1 Jul 22, 2021

Enhancements
- Store inferred datetime format on Datetime logical type instance (#1025)
- Add support for automatically inferring the EmailAddress logical type (#1047)
- Add feature origin attribute to schema (#1056)
- Add ability to calculate outliers and the statistical info required for box and whisker plots to WoodworkColumnAccessor (#1048)
- Add ability to change config settings in a with block with ww.config.with_options (#1062)
- Raises warning and removes tags when user adds a column with index tags to DataFrame (#1035)
Changes
- Entirely null columns are now inferred as the Unknown logical type (#1043)
- Add helper functions that check for whether an object is a koalas/dask series or dataframe (#1055)
- TableAccessor.select method will now maintain dataframe column ordering in TableSchema columns (#1052)
Documentation Changes
- Add supported types to metadata docstring (#1049)

Thanks to the following people for contributing to this release:
@davesque, @frances-h, @jeff-hernandez, @simha104, @tamargrey, @thehomebrewnerd, @tuethan1999

Assets 2

07 Jul 21:48

jeff-hernandez

v0.5.0

0d5c5c1

v0.5.0

v0.5.0 Jul 7, 2021

Enhancements
- Add support for numpy array inputs to Woodwork (#1023)
- Add support for pandas.api.extensions.ExtensionArray inputs to Woodwork (#1026)
Fixes
- Add input validation to ww.init_series (#1015)
Changes
- Remove lines in LogicalType.transform that raise error if dtype conflicts (#1012)
- Add infer_datetime_format param to speed up to_datetime calls (#1016)
- The default logical type is now the Unknown type instead of the NaturalLanguage type (#992)
- Add pandas 1.3.0 compatibility (#987)

Thanks to the following people for contributing to this release:
@jeff-hernandez, @simha104, @tamargrey, @thehomebrewnerd, @tuethan1999

Assets 2

23 Jun 15:22

tamargrey

v0.4.2

6485525

v0.4.2

v0.4.2 Jun 23, 2021

Enhancements
- Pass additional progress information in callback functions (#979)
- Add the ability to generate optional extra stats with DataFrame.ww.describe_dict (#988)
- Add option to read and write orc files (#997)
- Retain schema when calling series.ww.to_frame() (#1004)
Fixes
- Raise type conversion error in Datetime logical type (#1001)
- Try collections.abc to avoid deprecation warning (#1010)
Changes
- Remove make_index parameter from DataFrame.ww.init (#1000)
- Remove version restriction for dask requirements (#998)
Documentation Changes
- Add instructions for installing the update checker (#993)
- Disable pdf format with documentation build (#1002)
- Silence deprecation warnings in documentation build (#1008)
- Temporarily remove update checker to fix docs warnings (#1011)
Testing Changes
- Add env setting to update checker (#978, #994)

Breaking Changes

Progress callback functions parameters have changed and progress is now being reported in the units
specified by the unit of measurement parameter instead of percentage of total. Progress callback
functions now are expected to accept the following five parameters:
- progress increment since last call
- progress units complete so far
- total units to complete
- the progress unit of measurement
- time elapsed since start of calculation
DataFrame.ww.init no longer accepts the make_index parameter

Thanks to the following people for contributing to this release:
@frances-h, @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd, @tuethan1999

Assets 2

10 Jun 15:17

thehomebrewnerd

v0.4.1

cbba41a

v0.4.1

v0.4.1 Jun 9, 2021

Enhancements
- Add concat_columns util function to concatenate multiple Woodwork objects into one, retaining typing information (#932)
- Add option to pass progress callback function to mutual information functions (#958)
- Add optional automatic update checker (#959, #970)
Fixes
- Fix issue related to serialization/deserialization of data with whitespace and newline characters (#957)
- Update to allow initializing a ColumnSchema object with an Ordinal logical type without order values (#972)
Changes
- Change write_dataframe to only copy dataframe if it contains LatLong (#955)
Testing Changes
- Fix bug in test_list_logical_types_default (#954)
- Update minimum unit tests to run on all pull requests (#952)
- Pass token to authorize uploading of codecov reports (#969)

Thanks to the following people for contributing to this release:
@frances-h, @gsheni, @tamargrey, @thehomebrewnerd

Assets 2

26 May 18:02

gsheni

v0.4.0

6763b0a

v0.4.0

v0.4.0 May 26, 2021

Enhancements
- Add option to return TableSchema instead of DataFrame from table accessor select method (#916)
- Add option to pass progress callback function to mutual information functions (#943)
Fixes
- Fix bug when setting table name and metadata through accessor (#942)
- Fix bug in which the dtype of category values were not restored properly on deserialization (#949)
Changes
- Add logical type method to transform data (#915)
Testing Changes
- Update when minimum unit tests will run to include minimum text files (#917)
- Create separate workflows for each CI job (#919)

Thanks to the following people for contributing to this release:
@gsheni, @jeff-hernandez, @thehomebrewnerd, @tuethan1999

Assets 2

12 May 19:10

thehomebrewnerd

v0.3.1

0c21225

v0.3.1

v0.3.1 May 12, 2021

Enhancements
- Add deep parameter to Woodwork Accessor and Schema equality checks (#889)
- Add support for reading from parquet files to woodwork.read_file (#909)
Changes
- Remove command line functions for list logical and semantic tags (#891)
- Keep index and time index tags for single column when selecting from a table (#888)
- Update accessors to store weak reference to data (#894)
Documentation Changes
- Update nbsphinx version to fix docs build issue (#911, #913)
Testing Changes
- Use Minimum Dependency Generator GitHub Action and remove tools folder (#897)
- Move all latest and minimum dependencies into 1 folder (#912)

Breaking Changes

The command line functions python -m woodwork list-logical-types and python -m woodwork list-semantic-tags
no longer exist. Please call the underlying Python functions ww.list_logical_types() and
ww.list_semantic_tags().

Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd

Assets 2

03 May 21:19

gsheni

v0.3.0

24be7cc

v0.3.0

v0.3.0 May 3, 2021

Enhancements
- Add is_schema_valid and get_invalid_schema_message functions for checking schema validity (#834)
- Add logical type for Age and AgeNullable (#849)
- Add logical type for Address (#858)
- Add generic to_disk function to save Woodwork schema and data (#872)
- Add generic read_file function to read file as Woodwork DataFrame (#878)
Fixes
- Raise error when a column is set as the index and time index (#859)
- Allow NaNs in index for schema validation check (#862)
- Fix bug where invalid casting to Boolean would not raise error (#863)
Changes
- Consistently use ColumnNotPresentError for mismatches between user input and dataframe/schema columns (#837)
- Raise custom WoodworkNotInitError when accessing Woodwork attributes before initialization (#838)
- Remove check requiring Ordinal instance for initializing a ColumnSchema object (#870)
- Increase koalas min version to 1.8.0 (#885)
Documentation Changes
- Improve formatting of release notes (#874)
Testing Changes
- Remove unnecessary argument in codecov upload job (#853)
- Change from GitHub Token to regenerated GitHub PAT dependency checkers (#855)
- Update README.md with non-nullable dtypes in code example (#856)

Breaking Changes

Woodwork tables can no longer be saved using to disk df.ww.to_csv, df.ww.to_pickle, or
df.ww.to_parquet. Use df.ww.to_disk instead.
The read_csv function has been replaced by read_file.

Thanks to the following people for contributing to this release:
@frances-h, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd

Assets 2

20 Apr 19:11

tamargrey

v0.2.0

9c2e3a9

v0.2.0

v0.2.0 April 20, 2021

Enhancements
- Add validation control to WoodworkTableAccessor (#736)
- Store make_index value on WoodworkTableAccessor (#780)
- Add optional exclude parameter to WoodworkTableAccessor select method (#783)
- Add validation control to deserialize.read_woodwork_table and ww.read_csv (#788)
- Add WoodworkColumnAccessor.schema and handle copying column schema (#799)
- Allow initializing a WoodworkColumnAccessor with a ColumnSchema (#814)
- Add __repr__ to ColumnSchema (#817)
- Add BooleanNullable and IntegerNullable logical types (#830)
- Add validation control to WoodworkColumnAccessor (#833)
Changes
- Rename FullName logical type to PersonFullName (#740)
- Rename ZIPCode logical type to PostalCode (#741)
- Fix issue with smart-open version 5.0.0 (#750, #758)
- Update minimum scikit-learn version to 0.22 (#763)
- Drop support for Python version 3.6 (#768)
- Remove ColumnNameMismatchWarning (#777)
- get_column_dict does not use standard tags by default (#782)
- Make logical_type and name params to _get_column_dict optional (#786)
- Rename Schema object and files to match new table-column schema structure (#789)
- Store column typing information in a ColumnSchema object instead of a dictionary (#791)
- TableSchema does not use standard tags by default (#806)
- Store use_standard_tags on the ColumnSchema instead of the TableSchema (#809)
- Move functions in column_schema.py to be methods on ColumnSchema (#829)
Documentation Changes
- Update Pygments version requirement (#751)
- Update spark config for docs build (#787, #801, #810)
Testing Changes
- Add unit tests against minimum dependencies for python 3.6 on PRs and main (#743, #753, #763)
- Update spark config for test fixtures (#787)
- Separate latest unit tests into pandas, dask, koalas (#813)
- Update latest dependency checker to generate separate core, koalas, and dask dependencies (#815, #825)
- Ignore latest dependency branch when checking for updates to the release notes (#827)
- Change from GitHub PAT to auto generated GitHub Token for dependency checker (#831)
- Expand ColumnSchema semantic tag testing coverage and null logical_type testing coverage (#832)

Thanks to the following people for contributing to this release:
@gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd

Breaking Changes

The ZIPCode logical type has been renamed to PostalCode
The FullName logical type has been renamed to PersonFullName
The Schema object has been renamed to TableSchema
With the ColumnSchema object, typing information for a column can no longer be accessed
with df.ww.columns[col_name]['logical_type']. Instead use df.ww.columns[col_name].logical_type.
The Boolean and Integer logical types will no longer work with data that contains null
values. The new BooleanNullable and IntegerNullable logical types should be used if
null values are present.

Assets 2

22 Mar 20:40

gsheni

v0.1.0

2a28bfb

v0.1.0

v0.1.0 March 22, 2021

Enhancements
- Implement Schema and Accessor API (#497)
- Add Schema class that holds typing info (#499)
- Add WoodworkTableAccessor class that performs type inference and stores Schema (#514)
- Allow initializing Accessor schema with a valid Schema object (#522)
- Add ability to read in a csv and create a DataFrame with an initialized Woodwork Schema (#534)
- Add ability to call pandas methods from Accessor (#538, #589)
- Add helpers for checking if a column is one of Boolean, Datetime, numeric, or categorical (#553)
- Add ability to load demo retail dataset with a Woodwork Accessor (#556)
- Add select to WoodworkTableAccessor (#548)
- Add mutual_information to WoodworkTableAccessor (#571)
- Add WoodworkColumnAccessor class (#562)
- Add semantic tag update methods to column accessor (#573)
- Add describe and describe_dict to WoodworkTableAccessor (#579)
- Add init_series util function for initializing a series with dtype change (#581)
- Add set_logical_type method to WoodworkColumnAccessor (#590)
- Add semantic tag update methods to table schema (#591)
- Add warning if additional parameters are passed along with schema (#593)
- Better warning when accessing column properties before init (#596)
- Update column accessor to work with LatLong columns (#598)
- Add set_index to WoodworkTableAccessor (#603)
- Implement loc and iloc for WoodworkColumnAccessor (#613)
- Add set_time_index to WoodworkTableAccessor (#612)
- Implement loc and iloc for WoodworkTableAccessor (#618)
- Allow updating logical types with set_types and make relevant DataFrame changes (#619)
- Allow serialization of WoodworkColumnAccessor to csv, pickle, and parquet (#624)
- Add DaskColumnAccessor (#625)
- Allow deserialization from csv, pickle, and parquet to Woodwork table (#626)
- Add value_counts to WoodworkTableAccessor (#632)
- Add KoalasColumnAccessor (#634)
- Add pop to WoodworkTableAccessor (#636)
- Add drop to WoodworkTableAccessor (#640)
- Add rename to WoodworkTableAccessor (#646)
- Add DaskTableAccessor (#648)
- Add Schema properties to WoodworkTableAccessor (#651)
- Add KoalasTableAccessor (#652)
- Adds __getitem__ to WoodworkTableAccessor (#633)
- Update Koalas min version and add support for more new pandas dtypes with Koalas (#678)
- Adds __setitem__ to WoodworkTableAccessor (#669)
Fixes
- Create new Schema object when performing pandas operation on Accessors (#595)
- Fix bug in _reset_semantic_tags causing columns to share same semantic tags set (#666)
- Maintain column order in DataFrame and Woodwork repr (#677)
Changes
- Move mutual information logic to statistics utils file (#584)
- Bump min Koalas version to 1.4.0 (#638)
- Preserve pandas underlying index when not creating a Woodwork index (#664)
- Restrict Koalas version to <1.7.0 due to breaking changes (#674)
- Clean up dtype usage across Woodwork (#682)
- Improve error when calling accessor properties or methods before init (#683)
- Remove dtype from Schema dictionary (#685)
- Add include_index param and allow unique columns in Accessor mutual information (#699)
- Include DataFrame equality and use_standard_tags in WoodworkTableAccessor equality check (#700)
- Remove DataTable and DataColumn classes to migrate towards the accessor approach (#713)
- Change sample_series dtype to not need conversion and remove convert_series util (#720)
- Rename Accessor methods since DataTable has been removed (#723)
Documentation Changes
- Update README.md and Get Started guide to use accessor (#655, #717)
- Update Understanding Types and Tags guide to use accessor (#657)
- Update docstrings and API Reference page (#660)
- Update statistical insights guide to use accessor (#693)
- Update Customizing Type Inference guide to use accessor (#696)
- Update Dask and Koalas guide to use accessor (#701)
- Update index notebook and install guide to use accessor (#715)
- Add section to documentation about schema validity (#729)
- Update README.md and Get Started guide to use pd.read_csv (#730)
- Make small fixes to documentation formatting (#731)
Testing Changes
- Add tests to Accessor/Schema that weren't previously covered (#712, #716)
- Update release branch name in notes update check (#719)

Thanks to the following people for contributing to this release:
@gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey, @thehomebrewnerd

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.0 Aug 4, 2021

Breaking Changes

Contributors

v0.5.1 Jul 22, 2021

v0.4.2 Jun 23, 2021

v0.4.0 May 26, 2021

v0.3.1 May 12, 2021

Releases: alteryx/woodwork

v0.6.0

v0.6.0 Aug 4, 2021

Breaking Changes

Contributors

v0.5.1

v0.5.1 Jul 22, 2021

v0.5.0

v0.4.2

v0.4.2 Jun 23, 2021

v0.4.1

v0.4.0

v0.4.0 May 26, 2021

v0.3.1

v0.3.1 May 12, 2021

v0.3.0

v0.2.0

v0.1.0