Releases: unionai-oss/pandera
v0.26.1: Multi-index, `@check_types` Bugfixes
What's Changed
- fix MultiIndex check regression by @amerberg in #2116
- implement multiindex_strict and multiindex_unique add test cases by @amerberg in #2114
- Bugfix: #2058 Check_types for callable by @ybressler in #2069
New Contributors
- @ybressler made their first contribution in #2069
Full Changelog: v0.26.0...v0.26.1
v0.26.0: Add support for Python 3.13
⭐️ Highlight
📣 Pandera now supports Python 3.13! Now go forth and use bare forward reference types to your hearts content 🤗
What's Changed
- Enh/future annotations py3.13 by @cosmicBboy in #1980
- fix pyspark check registration by @cosmicBboy in #2087
- remove top-level pandera init import warning by @cosmicBboy in #2088
- Bugfix 2075: Polar dataframe default values - fill_nan AND fill_null for float columns by @cmsommerville in #2076
- Remove pylint by @cosmicBboy in #2086
- Upgrade
pyupgrade
hook and target Python version by @deepyaman in #2093 - Fix passing an empty column list to check duplicates by @rush4ratio in #2092
- Replace
Literal
imports fromtyping_extensions
by @deepyaman in #2100 - Add
.git-blame-ignore-revs
to avoid bulk changes by @deepyaman in #2101 - limit polars version on Mac OS by @amerberg in #2105
- delete monthly downloads, not available by @cosmicBboy in #2112
- Implement parser machinery and the
strict
parser by @deepyaman in #2096 - Support checking joint uniqueness of table columns by @deepyaman in #2097
- Reimplement pandas MultiIndex backend without inheriting from DataFrame backend by @amerberg in #2103
- fix(doc): clarify check_fn signature by @Farley-Chen in #2107
- Fix missing tests core directory by @rush4ratio in #2102
- fix polars Categorical bug by @cosmicBboy in #2113
New Contributors
- @cmsommerville made their first contribution in #2076
- @rush4ratio made their first contribution in #2092
- @Farley-Chen made their first contribution in #2107
Full Changelog: v0.25.0...v0.26.0
v0.25.0: 🦩 Support Ibis table validation
⭐️ Highlight
Pandera now supports Ibis 🦩! You can now validate data on all available ibis backends using the pandera.ibis
module.
In-memory table example:
import ibis
import pandera.ibis as pa
class Schema(pa.DataFrameModel):
state: str
city: str
price: int = pa.Field(in_range={"min_value": 5, "max_value": 20})
t = ibis.memtable(
{
'state': ['FL','FL','FL','CA','CA','CA'],
'city': [
'Orlando',
'Miami',
'Tampa',
'San Francisco',
'Los Angeles',
'San Diego',
],
'price': [8, 12, 10, 16, 20, 18],
}
)
Schema.validate(t).execute()
Sqlite example:
con = ibis.sqlite.connect()
t = con.create_table(
"table",
schema=ibis.schema(dict(state="string", city="string", price="int64"))
)
con.insert(
"table",
obj=[
("FL", "Orlando", 8),
("FL", "Miami", 12),
("FL", "Tampa", 10),
("CA", "San Francisco", 16),
("CA", "Los Angeles", 20),
("CA", "San Diego", 18),
]
)
Schema.validate(t).execute()
What does this mean?
This release unlocks in database validation in some of the most widely used data platforms, including PostGres, Snowflake, BigQuery, MySQL, and more ✨. It means that you can validate data at scale, on your database/data framework of your choice, before fetching it for downstream analysis/modeling work.
Naturally, this also means that you can develop your schemas locally on a duckdb
or sqlite
backend and then use the same schemas in production on a remote database like postgres
.
Learn more about the integration here.
What's Changed
- Add Polars pydantic integration with format support and native JSON schema generation by @halicki in #1979
- exclude python 3.12 and pyspark combo in ci by @cosmicBboy in #2005
- Delete previously-added foo.txt and new_example.py by @deepyaman in #2013
- Pin PySpark due to test failures/incompatibilities by @deepyaman in #2010
- Temporarily pin
polars
due to test failure in CI by @deepyaman in #2011 - Replace
event_loop
removed in pytest-asyncio 1.0 by @deepyaman in #2014 - Fix typehint in unique_values_eq (issue #1492) by @AhmetZamanis in #2015
- fix pyarrow string issue, fix docs failing issues by @cosmicBboy in #2026
- bugfix: PANDERA_VALIDATION_ENABLED=False should disable validation by @cosmicBboy in #2028
- Expect Python slice index errors after Python 3.10 by @deepyaman in #2033
- Ibis dev by @deepyaman in #2040
- handle dataframe-level failure cases: convert row to dict by @cosmicBboy in #2050
- bugfix/1927 by @Jarek-Rolski in #2019
- [🐻❄️ polars] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2055
- [🦩 ibis] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2056
- Add link to the documentation about Ibis datatypes by @deepyaman in #2057
- Test column presence, mark other features not impl by @deepyaman in #2060
- Run
pre-commit
on all files to fix linter issues by @deepyaman in #2063 - Implement
regex
option and add additional checks by @deepyaman in #2061 - Implement binary and boolean types (and test them) by @deepyaman in #2064
- Add unit test suite for Ibis components, fix a bug by @deepyaman in #2065
- bugfix: fix
format_vectorized_error_message
to properly format nested pyarrow failed cases by @AndrejIring in #2036 - handle empty dataframes with PydanticModel: show warning by @cosmicBboy in #2066
- bugfix/2031: Allow strict='filter' and coerce='True' at the same time for PySpark schemas by @gfilaci in #2032
- Set validation scope for pandas run_checks methods by @amerberg in #2003
- DataFrameSchema.update_index correctly sets title, description, and metadata by @cosmicBboy in #2067
- [ibis 🦩] remove inplace=True in column validate call by @cosmicBboy in #2068
- [ibis 🦩] check backend: use positional join for duckdb and polars, fix ibis DataFrameModel.validate types by @cosmicBboy in #2071
New Contributors
- @halicki made their first contribution in #1979
- @AhmetZamanis made their first contribution in #2015
- @AndrejIring made their first contribution in #2036
- @gfilaci made their first contribution in #2032
- @amerberg made their first contribution in #2003
Full Changelog: v0.24.0...v0.25.0
v0.25.0rc0: Support ibis table validation
What's Changed
- Add Polars pydantic integration with format support and native JSON schema generation by @halicki in #1979
- exclude python 3.12 and pyspark combo in ci by @cosmicBboy in #2005
- Delete previously-added foo.txt and new_example.py by @deepyaman in #2013
- Pin PySpark due to test failures/incompatibilities by @deepyaman in #2010
- Temporarily pin
polars
due to test failure in CI by @deepyaman in #2011 - Replace
event_loop
removed in pytest-asyncio 1.0 by @deepyaman in #2014 - Fix typehint in unique_values_eq (issue #1492) by @AhmetZamanis in #2015
- fix pyarrow string issue, fix docs failing issues by @cosmicBboy in #2026
- bugfix: PANDERA_VALIDATION_ENABLED=False should disable validation by @cosmicBboy in #2028
- Expect Python slice index errors after Python 3.10 by @deepyaman in #2033
- Ibis dev by @deepyaman in #2040
- handle dataframe-level failure cases: convert row to dict by @cosmicBboy in #2050
- bugfix/1927 by @Jarek-Rolski in #2019
- [🐻❄️ polars] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2055
- [🦩 ibis] Limit reported failure cases if Check.n_failure_cases is defined. by @cosmicBboy in #2056
- Add link to the documentation about Ibis datatypes by @deepyaman in #2057
- Test column presence, mark other features not impl by @deepyaman in #2060
- Run
pre-commit
on all files to fix linter issues by @deepyaman in #2063 - Implement
regex
option and add additional checks by @deepyaman in #2061 - Implement binary and boolean types (and test them) by @deepyaman in #2064
- Add unit test suite for Ibis components, fix a bug by @deepyaman in #2065
- bugfix: fix
format_vectorized_error_message
to properly format nested pyarrow failed cases by @AndrejIring in #2036 - handle empty dataframes with PydanticModel: show warning by @cosmicBboy in #2066
- bugfix/2031: Allow strict='filter' and coerce='True' at the same time for PySpark schemas by @gfilaci in #2032
- Set validation scope for pandas run_checks methods by @amerberg in #2003
- DataFrameSchema.update_index correctly sets title, description, and metadata by @cosmicBboy in #2067
- [ibis 🦩] remove inplace=True in column validate call by @cosmicBboy in #2068
New Contributors
- @halicki made their first contribution in #1979
- @AhmetZamanis made their first contribution in #2015
- @AndrejIring made their first contribution in #2036
- @gfilaci made their first contribution in #2032
- @amerberg made their first contribution in #2003
Full Changelog: v0.24.0...v0.25.0rc0
v0.24.0
✨ Highlights ✨
Import pandera.pandas
to define schemas for pandas
objects
🚨 Breaking Change
pandera==0.24.0
has dropped the dependency on pandas
and numpy
and has introduced a pandas
extra. This will break any users who relied on pandas
as a the transitive dependency of pandera
to install pandas
. To remediate this, do the following:
Install pandas
explicitly or use the pandas
extra
pip install 'pandera[pandas]' # recommended
# or
pip install pandas pandera
Change your import to pandera.pandas
All pandas-specific symbols that were exposed by the top-level pandera
module are now defined in the pandera.pandas
module.
# old import
import pandera as pa
# new import
import pandera.pandas as pa
Importing pandera as pa
for defining pandas schemas will still be available but will raise a warning. This will raise an ImportError
in 5 minor releases (0.29.0
).
What's Changed
- Bugfix/1908: handle multidimensional polars Array type by @hsuominen in #1909
- Bugfix/fix pandera compatibility with fastapi by @Jarek-Rolski in #1943
- update pylint version by @cosmicBboy in #1945
- Import for _version to get generated version by @thomasjpfan in #1951
- BUG: fix validate(sample=x) for pl.DataFrame by @m-richards in #1923
- Delete dev requirements that aren't generated/used by @deepyaman in #1953
- Drop dev dependencies to support Python before 3.8 by @deepyaman in #1956
- add UV to dev dependencies by @lundybernard in #1949
- feature(992): Create empty polars DataFrame by @khrapovs in #1950
- Pandas dependency deprecation future warning, add
pandera[pandas]
extra by @cosmicBboy in #1926 - Update imports from pandera to pandera.pandas by @cosmicBboy in #1965
- Bugfix/1938 Improve Pandera DataFrame - Pydantic compatibility by @Jarek-Rolski in #1963
- Blacken a few files that aren't properly formatted by @deepyaman in #1970
- add import and future warning for top-level pandera module by @cosmicBboy in #1969
- Fix inconsistent column filtering in Polars backend with
add_missing_columns
by @ksolarski in #1962 - bugfix: parser is applied before type coercion by @cosmicBboy in #1974
- bugfix: correctly support dataframe-level polars checks by @cosmicBboy in #1972
- enh: enable mypy in more polars places (#1976) by @cosmicBboy in #1977
- bugfix: custom parser runs before getting column_info by @cosmicBboy in #1978
- Add spellchecker and Fix typos by @nathanjmcdougall in #1975
- bugfix: make DataFrameModel MODEL_CACHE thread-aware by @cosmicBboy in #1981
- replace discord with a slack by @cosmicBboy in #1982
- update docs and readme for pandera.pandas FutureWarning by @cosmicBboy in #1983
- bugfix: fix none -> * handling in PolarsData refactor by @m-richards in #1984
- bugfix: polars supports regex for non-required columns by @cosmicBboy in #1992
- bugfix: polars categorical try coerce by @m-richards in #1985
- Fix CI: pin uv<0.7.0 by @cosmicBboy in #1995
- feat(dataframe): add
update_index
,update_indexes
andrename_indexes
by @kdheepak in #1989 - Bugfix/1937 Serialization of checks does only support one check per check function by @Jarek-Rolski in #1987
- Minor documentation fix:
polars_version
docstring referencedmodin
notpolars
by @TobiasDummschat in #1998 - enh: run mypy over all of the polars code by @m-richards in #1986
- drop_invalid_rows still raises schema-level errors by @cosmicBboy in #2000
New Contributors
- @hsuominen made their first contribution in #1909
- @khrapovs made their first contribution in #1950
- @kdheepak made their first contribution in #1989
- @TobiasDummschat made their first contribution in #1998
Full Changelog: v0.23.1...v0.24.0
v0.24.0rc0: Drop pandas and numpy dependency, introduce pandas extra
✨ Highlights ✨
Import pandera.pandas
to define schemas for pandas
objects
🚨 Breaking Change
pandera==0.24.0
has dropped the dependency on pandas
and numpy
and has introduced a pandas
extra. This will break any users who relied on pandas
as a the transitive dependency of pandera
to install pandas
. To remediate this, do the following:
Install pandas
explicitly or use the pandas
extra
pip install pandas pandera
# or
pip install 'pandera[pandas]'
Change your import to pandera.pandas
All pandas-specific symbols that were exposed by the top-level pandera
module are now defined in the pandera.pandas
module.
# old import
import pandera as pa
# new import
import pandera.pandas as pa
What's Changed
- Bugfix/1908: handle multidimensional polars Array type by @hsuominen in #1909
- Bugfix/fix pandera compatibility with fastapi by @Jarek-Rolski in #1943
- update pylint version by @cosmicBboy in #1945
- Import for _version to get generated version by @thomasjpfan in #1951
- BUG: fix validate(sample=x) for pl.DataFrame by @m-richards in #1923
- Delete dev requirements that aren't generated/used by @deepyaman in #1953
- Drop dev dependencies to support Python before 3.8 by @deepyaman in #1956
- add UV to dev dependencies by @lundybernard in #1949
- feature(992): Create empty polars DataFrame by @khrapovs in #1950
- Pandas dependency deprecation future warning, add
pandera[pandas]
extra by @cosmicBboy in #1926 - Update imports from pandera to pandera.pandas by @cosmicBboy in #1965
- Bugfix/1938 Improve Pandera DataFrame - Pydantic compatibility by @Jarek-Rolski in #1963
- Blacken a few files that aren't properly formatted by @deepyaman in #1970
- add import and future warning for top-level pandera module by @cosmicBboy in #1969
- Fix inconsistent column filtering in Polars backend with
add_missing_columns
by @ksolarski in #1962 - bugfix: parser is applied before type coercion by @cosmicBboy in #1974
- bugfix: correctly support dataframe-level polars checks by @cosmicBboy in #1972
- enh: enable mypy in more polars places (#1976) by @cosmicBboy in #1977
- bugfix: custom parser runs before getting column_info by @cosmicBboy in #1978
- Add spellchecker and Fix typos by @nathanjmcdougall in #1975
- bugfix: make DataFrameModel MODEL_CACHE thread-aware by @cosmicBboy in #1981
- replace discord with a slack by @cosmicBboy in #1982
- update docs and readme for pandera.pandas FutureWarning by @cosmicBboy in #1983
New Contributors
- @hsuominen made their first contribution in #1909
- @khrapovs made their first contribution in #1950
Full Changelog: v0.23.1...v0.24.0
v0.23.1
What's Changed
- handle None dtype when constructing json schema from DataFrameModel by @cosmicBboy in #1931
- Fix: 1933 - @pa.dataframe_check pass the check_args to the class method by @danield-catalyst in #1934
- 🐛 add field types for ExtensionDtype by @mauro-dribia in #1929
- fix the reversion of schema component mutations by @cosmicBboy in #1936
New Contributors
Special shoutout to the new contributors!
- @danield-catalyst made their first contribution in #1934
- @mauro-dribia made their first contribution in #1929
Full Changelog: v0.23.0...v0.23.1
v0.23.0: Improve pydantic compatibility, add `json_normalize`, bugfixes
What's Changed
- Create empty dataframe from Pandas DataFrame Model by @mamo3gr in #1880
- bugfix/1835: Keep nulls in polars when dropping invalid rows and nullable=True by @baldwinj30 in #1890
- Enhancment/1886 Add json_normalize to pandas read formats by @Jarek-Rolski in #1892
- only call parsers once by @cosmicBboy in #1898
- Fix type hints of pa.Field so Iterable and dict arguments actually contain type information by @dolfandringa in #1901
- Bugfix/1677 Fix Pandera DataFrame - Pydantic compatibility by @Jarek-Rolski in #1904
- Enhancement: Add support for timezone-flexible DateTime (#1352) by @max-raphael in #1902
- Bugfix/763 improve type annotations for DataFrameModel.validate by @m-richards in #1905
- Declare support for Python3.12 by @g-as in #1897
- Use uv in noxfile and ci-tests, migrate to pyproject.toml by @cosmicBboy in #1916
- Update modin.md. pd import called twice by @theorashid in #1918
- Update publish ci by @cosmicBboy in #1921
- remove commented code in readthedocs yaml by @cosmicBboy in #1920
- update publish ci to use pypi publisher by @cosmicBboy in #1922
New Contributors
- @mamo3gr made their first contribution in #1880
- @dolfandringa made their first contribution in #1901
- @max-raphael made their first contribution in #1902
- @g-as made their first contribution in #1897
- @theorashid made their first contribution in #1918
Full Changelog: v0.22.1...v0.23.0
v0.23.0b2: Testing new pypi publishing system
What's Changed
- Create empty dataframe from Pandas DataFrame Model by @mamo3gr in #1880
- bugfix/1835: Keep nulls in polars when dropping invalid rows and nullable=True by @baldwinj30 in #1890
- Enhancment/1886 Add json_normalize to pandas read formats by @Jarek-Rolski in #1892
- only call parsers once by @cosmicBboy in #1898
- Fix type hints of pa.Field so Iterable and dict arguments actually contain type information by @dolfandringa in #1901
- Bugfix/1677 Fix Pandera DataFrame - Pydantic compatibility by @Jarek-Rolski in #1904
- Enhancement: Add support for timezone-flexible DateTime (#1352) by @max-raphael in #1902
- Bugfix/763 improve type annotations for DataFrameModel.validate by @m-richards in #1905
- Declare support for Python3.12 by @g-as in #1897
- Use uv in noxfile and ci-tests, migrate to pyproject.toml by @cosmicBboy in #1916
- Update modin.md. pd import called twice by @theorashid in #1918
- Update publish ci by @cosmicBboy in #1921
- remove commented code in readthedocs yaml by @cosmicBboy in #1920
- update publish ci to use pypi publisher by @cosmicBboy in #1922
New Contributors
- @mamo3gr made their first contribution in #1880
- @dolfandringa made their first contribution in #1901
- @max-raphael made their first contribution in #1902
- @g-as made their first contribution in #1897
- @theorashid made their first contribution in #1918
Full Changelog: v0.22.1...v0.23.0b2
Release v0.22.1: Fix `check_input` decorator regression
What's Changed
- bugfix: check_input decorator handles functions with kwargs by @cosmicBboy in #1888
Full Changelog: v0.22.0...v0.22.1