Releases: snowflakedb/snowpark-python
Releases · snowflakedb/snowpark-python
Release
1.25.0 (2024-11-13)
Snowpark Python API Updates
New Features
- Added the following new functions in
snowflake.snowpark.dataframe:map
- Added support for passing parameter
include_errortoSession.query_historyto record queries that have error during execution.
Improvements
- When target stage is not set in profiler, a default stage from
Session.get_session_stageis used instead of raisingSnowparkSQLException. - Allowed lower case or mixed case input when calling
Session.stored_procedure_profiler.set_active_profiler. - Added distributed tracing using open telemetry APIs for action function in
DataFrame:cache_result
- Removed opentelemetry warning from logging.
Bug Fixes
- Fixed the pre-action and post-action query propagation when
Inexpression were used in selects. - Fixed a bug that raised error
AttributeErrorwhile callingSession.stored_procedure_profiler.get_outputwhenSession.stored_procedure_profileris disabled.
Dependency Updates
- Added a dependency on
protobuf>=5.28andtzlocalat runtime. - Added a dependency on
protoc-wheel-0for the development profile. - Require
snowflake-connector-python>=3.12.0, <4.0.0(was>=3.10.0).
Snowpark pandas API Updates
Dependency Updates
- Updated
modinfrom 0.28.1 to 0.30.1. - Added support for all
pandas2.2.x versions.
New Features
- Added support for
Index.to_numpy. - Added support for
DataFrame.alignandSeries.alignforaxis=0. - Added support for
sizeinGroupBy.aggregate,DataFrame.aggregate, andSeries.aggregate. - Added support for
snowflake.snowpark.functions.window - Added support for
pd.read_pickle(Uses native pandas for processing). - Added support for
pd.read_html(Uses native pandas for processing). - Added support for
pd.read_xml(Uses native pandas for processing). - Added support for aggregation functions
"size"andleninGroupBy.aggregate,DataFrame.aggregate, andSeries.aggregate. - Added support for list values in
Series.str.len.
Bug Fixes
- Fixed a bug where aggregating a single-column dataframe with a single callable function (e.g.
pd.DataFrame([0]).agg(np.mean)) would fail to transpose the result. - Fixed bugs where
DataFrame.dropna()would:- Treat an empty
subset(e.g.[]) as if it specified all columns instead of no columns. - Raise a
TypeErrorfor a scalarsubsetinstead of filtering on just that column. - Raise a
ValueErrorfor asubsetof typepandas.Indexinstead of filtering on the columns in the index.
- Treat an empty
- Disable creation of scoped read only table to mitigate Disable creation of scoped read only table to mitigate
TableNotFoundErrorwhen using dynamic pivot in notebook environment. - Fixed a bug when concat dataframe or series objects are coming from the same dataframe when axis = 1.
Improvements
- Improve np.where with scalar x value by eliminating unnecessary join and temp table creation.
- Improve get_dummies performance by flattening the pivot with join.
Snowpark Local Testing Updates
New Features
- Added support for patching functions that are unavailable in the
snowflake.snowpark.functionsmodule. - Added support for
snowflake.snowpark.functions.any_value
Bug Fixes
- Fixed a bug where
Table.updatecould not handleVariantType,MapType, andArrayTypedata types. - Fixed a bug where column aliases were incorrectly resolved in
DataFrame.join, causing errors when selecting columns from a joined DataFrame. - Fixed a bug where
Table.updateandTable.mergecould fail if the target table's index was not the defaultRangeIndex.
Release
1.24.0 (2024-10-28)
Snowpark Python API Updates
New Features
- Updated
Sessionclass to be thread-safe. This allows concurrent DataFrame transformations, DataFrame actions, UDF and stored procedure registration, and concurrent file uploads when using the sameSessionobject.- The feature is disabled by default and can be enabled by setting
FEATURE_THREAD_SAFE_PYTHON_SESSIONtoTruefor account. - Updating session configurations, like changing database or schema, when multiple threads are using the session may lead to unexpected behavior.
- When enabled, some internally created temporary table names returned from
DataFrame.queriesAPI are not deterministic, and may be different when DataFrame actions are executed. This does not affect explicit user-created temporary tables.
- The feature is disabled by default and can be enabled by setting
- Added support for 'Service' domain to
session.lineage.traceAPI. - Added support for
copy_grantsparameter when registering UDxF and stored procedures. - Added support for the following methods in
DataFrameWriterto support daisy-chaining:optionoptionspartition_by
- Added support for
snowflake_cortex_summarize.
Improvements
- Improved the following new capability for function
snowflake.snowpark.functions.array_removeit is now possible to use in python. - Disables sql simplification when sort is performed after limit.
- Previously,
df.sort().limit()anddf.limit().sort()generates the same query with sort in front of limit. Now,df.limit().sort()will generate query that readsdf.limit().sort(). - Improve performance of generated query for
df.limit().sort(), because limit stops table scanning as soon as the number of records is satisfied.
- Previously,
Bug Fixes
- Fixed a bug where the automatic cleanup of temporary tables could interfere with the results of async query execution.
- Fixed a bug in
DataFrame.analytics.time_series_aggfunction to handle multiple data points in same sliding interval. - Fixed a bug that created inconsistent casing in field names of structured objects in iceberg schemas.
Deprecations
- Deprecated warnings will be triggered when using snowpark-python with Python 3.8. For more details, please refer to https://docs.snowflake.com/en/developer-guide/python-runtime-support-policy.
Snowpark pandas API Updates
New Features
- Added support for
np.subtract,np.multiply,np.divide, andnp.true_divide. - Added support for tracking usages of
__array_ufunc__. - Added numpy compatibility support for
np.float_power,np.mod,np.remainder,np.greater,np.greater_equal,np.less,np.less_equal,np.not_equal, andnp.equal. - Added numpy compatibility support for
np.log,np.log2, andnp.log10 - Added support for
DataFrameGroupBy.bfill,SeriesGroupBy.bfill,DataFrameGroupBy.ffill, andSeriesGroupBy.ffill. - Added support for
onparameter withResampler. - Added support for timedelta inputs in
value_counts(). - Added support for applying Snowpark Python function
snowflake_cortex_summarize. - Added support for
DataFrame.attrsandSeries.attrs. - Added support for
DataFrame.style.
Improvements
- Improved generated SQL query for
headandilocwhen the row key is a slice. - Improved error message when passing an unknown timezone to
tz_convertandtz_localizeinSeries,DataFrame,Series.dt, andDatetimeIndex. - Improved documentation for
tz_convertandtz_localizeinSeries,DataFrame,Series.dt, andDatetimeIndexto specify the supported timezone formats. - Added additional kwargs support for
df.applyandseries.apply( as well asmapandapplymap) when using snowpark functions. This allows for some position independent compatibility between apply and functions where the first argument is not a pandas object. - Improved generated SQL query for
ilocandiatwhen the row key is a scalar. - Removed all joins in
iterrows. - Improved documentation for
Series.mapto reflect the unsupported features. - Added support for
np.may_share_memorywhich is used internally by many scikit-learn functions. This method will always return false when called with a Snowpark pandas object.
Bug Fixes
- Fixed a bug where
DataFrameandSeriespct_change()would raiseTypeErrorwhen input contained timedelta columns. - Fixed a bug where
replace()would sometimes propagateTimedeltatypes incorrectly throughreplace(). Instead raiseNotImplementedErrorforreplace()onTimedelta. - Fixed a bug where
DataFrameandSeriesround()would raiseAssertionErrorforTimedeltacolumns. Instead raiseNotImplementedErrorforround()onTimedelta. - Fixed a bug where
reindexfails when the new index is a Series with non-overlapping types from the original index. - Fixed a bug where calling
__getitem__on a DataFrameGroupBy object always returned a DataFrameGroupBy object ifas_index=False. - Fixed a bug where inserting timedelta values into an existing column would silently convert the values to integers instead of raising
NotImplementedError. - Fixed a bug where
DataFrame.shift()on axis=0 and axis=1 would fail to propagate timedelta types. DataFrame.abs(),DataFrame.__neg__(),DataFrame.stack(), andDataFrame.unstack()now raiseNotImplementedErrorfor timedelta inputs instead of failing to propagate timedelta types.
Snowpark Local Testing Updates
Bug Fixes
- Fixed a bug where
DataFrame.aliasraisesKeyErrorfor input column name. - Fixed a bug where
to_csvon Snowflake stage fails when data contains empty strings.
Release
1.23.0 (2024-10-09)
Snowpark Python API Updates
New Features
- Added the following new functions in
snowflake.snowpark.functions:make_interval
- Added support for using Snowflake Interval constants with
Window.range_between()when the order by column is TIMESTAMP or DATE type. - Added support for file writes. This feature is currently in private preview.
- Added
thread_idtoQueryRecordto track the thread id submitting the query history. - Added support for
Session.stored_procedure_profiler.
Improvements
Bug Fixes
- Fixed a bug where registering a stored procedure or UDxF with type hints would give a warning
'NoneType' has no len() when trying to read default values from function.
Snowpark pandas API Updates
New Features
- Added support for
TimedeltaIndex.meanmethod. - Added support for some cases of aggregating
Timedeltacolumns onaxis=0withaggoraggregate. - Added support for
by,left_by,right_by,left_index, andright_indexforpd.merge_asof. - Added support for passing parameter
include_describetoSession.query_history. - Added support for
DatetimeIndex.meanandDatetimeIndex.stdmethods. - Added support for
Resampler.asfreq,Resampler.indices,Resampler.nunique, andResampler.quantile. - Added support for
resamplefrequencyW,ME,YEwithclosed = "left". - Added support for
DataFrame.rolling.corrandSeries.rolling.corrforpairwise = Falseand intwindow. - Added support for string time-based
windowandmin_periods = NoneforRolling. - Added support for
DataFrameGroupBy.fillnaandSeriesGroupBy.fillna. - Added support for constructing
SeriesandDataFrameobjects with the lazyIndexobject asdata,index, andcolumnsarguments. - Added support for constructing
SeriesandDataFrameobjects withindexandcolumnvalues not present inDataFrame/Seriesdata. - Added support for
pd.read_sas(Uses native pandas for processing). - Added support for applying
rolling().count()andexpanding().count()toTimedeltaseries and columns. - Added support for
tzin bothpd.date_rangeandpd.bdate_range. - Added support for
Series.items. - Added support for
errors="ignore"inpd.to_datetime. - Added support for
DataFrame.tz_localizeandSeries.tz_localize. - Added support for
DataFrame.tz_convertandSeries.tz_convert. - Added support for applying Snowpark Python functions (e.g.,
sin) inSeries.map,Series.apply,DataFrame.applyandDataFrame.applymap.
Improvements
- Improved
to_pandasto persist the original timezone offset for TIMESTAMP_TZ type. - Improved
dtyperesults for TIMESTAMP_TZ type to show correct timezone offset. - Improved
dtyperesults for TIMESTAMP_LTZ type to show correct timezone. - Improved error message when passing non-bool value to
numeric_onlyfor groupby aggregations. - Removed unnecessary warning about sort algorithm in
sort_values. - Use SCOPED object for internal create temp tables. The SCOPED objects will be stored sproc scoped if created within stored sproc, otherwise will be session scoped, and the object will be automatically cleaned at the end of the scope.
- Improved warning messages for operations that lead to materialization with inadvertent slowness.
- Removed unnecessary warning message about
convert_dtypeinSeries.apply.
Bug Fixes
- Fixed a bug where an
Indexobject created from aSeries/DataFrameincorrectly updates theSeries/DataFrame's index name after an inplace update has been applied to the originalSeries/DataFrame. - Suppressed an unhelpful
SettingWithCopyWarningthat sometimes appeared when printingTimedeltacolumns. - Fixed
inplaceargument forSeriesobjects derived from otherSeriesobjects. - Fixed a bug where
Series.sort_valuesfailed if series name overlapped with index column name. - Fixed a bug where transposing a dataframe would map
Timedeltaindex levels to integer column levels. - Fixed a bug where
Resamplermethods on timedelta columns would produce integer results. - Fixed a bug where
pd.to_numeric()would leaveTimedeltainputs asTimedeltainstead of converting them to integers. - Fixed
locset when setting a single row, or multiple rows, of a DataFrame with a Series value.
Release
1.22.1 (2024-09-11)
This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.
1.22.0 (2024-09-10)
Snowpark Python API Updates
New Features
- Added the following new functions in
snowflake.snowpark.functions:array_removeln
Improvements
- Improved documentation for
Session.write_pandasby makinguse_logical_typeoption more explicit. - Added support for specifying the following to
DataFrameWriter.save_as_table:enable_schema_evolutiondata_retention_timemax_data_extension_timechange_trackingcopy_grantsiceberg_configA dicitionary that can hold the following iceberg configuration options:external_volumecatalogbase_locationcatalog_syncstorage_serialization_policy
- Added support for specifying the following to
DataFrameWriter.copy_into_table:iceberg_configA dicitionary that can hold the following iceberg configuration options:external_volumecatalogbase_locationcatalog_syncstorage_serialization_policy
- Added support for specifying the following parameters to
DataFrame.create_or_replace_dynamic_table:moderefresh_modeinitializeclustering_keysis_transientdata_retention_timemax_data_extension_time
Bug Fixes
- Fixed a bug in
session.read.csvthat caused an error when settingPARSE_HEADER = Truein an externally defined file format. - Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.
- Fixed a bug in
session.get_session_stagethat referenced a non-existing stage after switching database or schema. - Fixed a bug where calling
DataFrame.to_snowpark_pandaswithout explicitly initializing the Snowpark pandas plugin caused an error. - Fixed a bug where using the
explodefunction in dynamic table creation caused a SQL compilation error due to improper boolean type casting on theouterparameter.
Snowpark Local Testing Updates
New Features
- Added support for type coercion when passing columns as input to UDF calls.
- Added support for
Index.identical.
Bug Fixes
- Fixed a bug where the truncate mode in
DataFrameWriter.save_as_tableincorrectly handled DataFrames containing only a subset of columns from the existing table. - Fixed a bug where function
to_timestampdoes not set the default timezone of the column datatype.
Snowpark pandas API Updates
New Features
- Added limited support for the
Timedeltatype, including the following features. Snowpark pandas will raiseNotImplementedErrorfor unsupportedTimedeltause cases.- supporting tracking the Timedelta type through
copy,cache_result,shift,sort_index,assign,bfill,ffill,fillna,compare,diff,drop,dropna,duplicated,empty,equals,insert,isin,isna,items,iterrows,join,len,mask,melt,merge,nlargest,nsmallest,to_pandas. - converting non-timedelta to timedelta via
astype. NotImplementedErrorwill be raised for the rest of methods that do not supportTimedelta.- support for subtracting two timestamps to get a Timedelta.
- support indexing with Timedelta data columns.
- support for adding or subtracting timestamps and
Timedelta. - support for binary arithmetic between two
Timedeltavalues. - support for binary arithmetic and comparisons between
Timedeltavalues and numeric values. - support for lazy
TimedeltaIndex. - support for
pd.to_timedelta. - support for
GroupByaggregationsmin,max,mean,idxmax,idxmin,std,sum,median,count,any,all,size,nunique,head,tail,aggregate. - support for
GroupByfiltrationsfirstandlast. - support for
TimedeltaIndexattributes:days,seconds,microsecondsandnanoseconds. - support for
diffwith timestamp columns onaxis=0andaxis=1 - support for
TimedeltaIndexmethods:ceil,floorandround. - support for
TimedeltaIndex.total_secondsmethod.
- supporting tracking the Timedelta type through
- Added support for index's arithmetic and comparison operators.
- Added support for
Series.dt.round. - Added documentation pages for
DatetimeIndex. - Added support for
Index.name,Index.names,Index.rename, andIndex.set_names. - Added support for
Index.__repr__. - Added support for
DatetimeIndex.month_nameandDatetimeIndex.day_name. - Added support for
Series.dt.weekday,Series.dt.time, andDatetimeIndex.time. - Added support for
Index.minandIndex.max. - Added support for
pd.merge_asof. - Added support for
Series.dt.normalizeandDatetimeIndex.normalize. - Added support for
Index.is_boolean,Index.is_integer,Index.is_floating,Index.is_numeric, andIndex.is_object. - Added support for
DatetimeIndex.round,DatetimeIndex.floorandDatetimeIndex.ceil. - Added support for
Series.dt.days_in_monthandSeries.dt.daysinmonth. - Added support for
DataFrameGroupBy.value_countsandSeriesGroupBy.value_counts. - Added support for
Series.is_monotonic_increasingandSeries.is_monotonic_decreasing. - Added support for
Index.is_monotonic_increasingandIndex.is_monotonic_decreasing. - Added support for
pd.crosstab. - Added support for
pd.bdate_rangeand included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for bothpd.date_rangeandpd.bdate_range. - Added support for lazy
Indexobjects aslabelsinDataFrame.reindexandSeries.reindex. - Added support for
Series.dt.days,Series.dt.seconds,Series.dt.microseconds, andSeries.dt.nanoseconds. - Added support for creating a
DatetimeIndexfrom anIndexof numeric or string type. - Added support for string indexing with
Timedeltaobjects. - Added support for
Series.dt.total_secondsmethod.
Improvements
- Improve concat, join performance when operations are performed on series coming from the same dataframe by avoiding unnecessary joins.
- Refactored
quoted_identifier_to_snowflake_typeto avoid making metadata queries if the types have been cached locally. - Improved
pd.to_datetimeto handle all local input cases. - Create a lazy index from another lazy index without pulling data to client.
- Raised
NotImplementedErrorfor Index bitwise operators. - Display a more clear error message when
Index.namesis set to a non-like-like object. - Raise a warning whenever MultiIndex values are pulled in locally.
- Improve warning message for
pd.read_snowflakeinclude the creation reason when temp table creation is triggered. - Improve performance for
DataFrame.set_index, or settingDataFrame.indexorSeries.indexby avoiding checks require eager evaluation. As a consequence, when the new index that does not match the currentSeries/DataFrameobject length, aValueErroris no longer raised. Instead, when theSeries/DataFrameobject is longer than the provided index, theSeries/DataFrame's new index is filled withNaNvalues for the "extra" elements. Otherwise, the extra values in the provided index are ignored.
Bug Fixes
- Stopped ignoring nanoseconds in
pd.Timedeltascalars. - Fixed AssertionError in tree of binary operations.
- Fixed bug in
Series.dt.isocalendarusing a named Series - Fixed
inplaceargument for Series objects derived from DataFrame columns. - Fixed a bug where
Series.reindexandDataFrame.reindexdid not update the result index's name correctly. - Fixed a bug where
Series.takedid not error whenaxis=1was specified.
Release
1.21.1 (2024-09-05)
Snowpark Python API Updates
Bug Fixes
- Fixed a bug where using
to_pandas_batcheswith async jobs caused an error due to improper handling of waiting for asynchronous query completion.
Release
1.21.0 (2024-08-19)
Snowpark Python API Updates
New Features
- Added support for
snowflake.snowpark.testing.assert_dataframe_equalthat is a utility function to check the equality of two Snowpark DataFrames.
Improvements
- Added support server side string size limitations.
- Added support to create and invoke stored procedures, UDFs and UDTFs with optional arguments.
- Added support for column lineage in the DataFrame.lineage.trace API.
- Added support for passing
INFER_SCHEMAoptions toDataFrameReaderviaINFER_SCHEMA_OPTIONS. - Added support for passing
parametersparameter toColumn.rlikeandColumn.regexp. - Added support for automatically cleaning up temporary tables created by
df.cache_result()in the current session, when the DataFrame is no longer referenced (i.e., gets garbage collected). It is still an experimental feature not enabled by default, and can be enabled by settingsession.auto_clean_up_temp_table_enabledtoTrue. - Added support for string literals to the
fmtparameter ofsnowflake.snowpark.functions.to_date.
Bug Fixes
- Fixed a bug where SQL generated for selecting
*column has an incorrect subquery. - Fixed a bug in
DataFrame.to_pandas_batcheswhere the iterator could throw an error if certain transformation is made to the pandas dataframe due to wrong isolation level. - Fixed a bug in
DataFrame.lineage.traceto split the quoted feature view's name and version correctly. - Fixed a bug in
Column.isinthat caused invalid sql generation when passed an empty list. - Fixed a bug that fails to raise NotImplementedError while setting cell with list like item.
Snowpark Local Testing Updates
New Features
- Added support for the following APIs:
- snowflake.snowpark.functions
rankdense_rankpercent_rankcume_distntiledatediffarray_agg
- snowflake.snowpark.column.Column.within_group
- snowflake.snowpark.functions
- Added support for parsing flags in regex statements for mocked plans. This maintains parity with the
rlikeandregexpchanges above.
Bug Fixes
- Fixed a bug where Window Functions LEAD and LAG do not handle option
ignore_nullsproperly. - Fixed a bug where values were not populated into the result DataFrame during the insertion of table merge operation.
Improvements
- Fix pandas FutureWarning about integer indexing.
Snowpark pandas API Updates
New Features
- Added support for
DataFrame.backfill,DataFrame.bfill,Series.backfill, andSeries.bfill. - Added support for
DataFrame.compareandSeries.comparewith default parameters. - Added support for
Series.dt.microsecondandSeries.dt.nanosecond. - Added support for
Index.is_uniqueandIndex.has_duplicates. - Added support for
Index.equals. - Added support for
Index.value_counts. - Added support for
Series.dt.day_nameandSeries.dt.month_name. - Added support for indexing on Index, e.g.,
df.index[:10]. - Added support for
DataFrame.unstackandSeries.unstack. - Added support for
DataFrame.asfreqandSeries.asfreq. - Added support for
Series.dt.is_month_startandSeries.dt.is_month_end. - Added support for
Index.allandIndex.any. - Added support for
Series.dt.is_year_startandSeries.dt.is_year_end. - Added support for
Series.dt.is_quarter_startandSeries.dt.is_quarter_end. - Added support for lazy
DatetimeIndex. - Added support for
Series.argmaxandSeries.argmin. - Added support for
Series.dt.is_leap_year. - Added support for
DataFrame.items. - Added support for
Series.dt.floorandSeries.dt.ceil. - Added support for
Index.reindex. - Added support for
DatetimeIndexproperties:year,month,day,hour,minute,second,microsecond,
nanosecond,date,dayofyear,day_of_year,dayofweek,day_of_week,weekday,quarter,
is_month_start,is_month_end,is_quarter_start,is_quarter_end,is_year_start,is_year_end
andis_leap_year. - Added support for
Resampler.fillnaandResampler.bfill. - Added limited support for the
Timedeltatype, including creatingTimedeltacolumns andto_pandas. - Added support for
Index.argmaxandIndex.argmin.
Improvements
- Removed the public preview warning message when importing Snowpark pandas.
- Removed unnecessary count query from
SnowflakeQueryCompiler.is_series_likemethod. Dataframe.columnsnow returns native pandas Index object instead of Snowpark Index object.- Refactor and introduce
query_compilerargument inIndexconstructor to createIndexfrom query compiler. pd.to_datetimenow returns a DatetimeIndex object instead of a Series object.pd.date_rangenow returns a DatetimeIndex object instead of a Series object.
Bug Fixes
- Made passing an unsupported aggregation function to
pivot_tableraiseNotImplementedErrorinstead ofKeyError. - Removed axis labels and callable names from error messages and telemetry about unsupported aggregations.
- Fixed AssertionError in
Series.drop_duplicatesandDataFrame.drop_duplicateswhen called aftersort_values. - Fixed a bug in
Index.to_framewhere the result frame's column name may be wrong where name is unspecified. - Fixed a bug where some Index docstrings are ignored.
- Fixed a bug in
Series.reset_index(drop=True)where the result name may be wrong. - Fixed a bug in
Groupby.first/lastordering by the correct columns in the underlying window expression.
Release
1.20.0 (2024-07-17)
Snowpark Python API Updates
Improvements
- Added distributed tracing using open telemetry APIs for table stored procedure function in
DataFrame:_execute_and_get_query_id
- Added support for the
arrays_zipfunction. - Improves performance for binary column expression and
df._inby avoiding unnecessary cast for numeric values. You can enable this optimization by settingsession.eliminate_numeric_sql_value_cast_enabled = True. - Improved error message for
write_pandaswhen the target table does not exist andauto_create_table=False. - Added open telemetry tracing on UDxF functions in Snowpark.
- Added open telemetry tracing on stored procedure registration in Snowpark.
- Added a new optional parameter called
format_jsonto theSession.SessionBuilder.app_namefunction that sets the app name in theSession.query_tagin JSON format. By default, this parameter is set toFalse.
Bug Fixes
- Fixed a bug where SQL generated for
lag(x, 0)was incorrect and failed with error messageargument 1 to function LAG needs to be constant, found 'SYSTEM$NULL_TO_FIXED(null)'.
Snowpark Local Testing Updates
New Features
- Added support for the following APIs:
- snowflake.snowpark.functions
- random
- snowflake.snowpark.functions
- Added new parameters to
patchfunction when registering a mocked function:distinctallows an alternate function to be specified for when a sql function should be distinct.pass_column_indexpasses a named parametercolumn_indexto the mocked function that contains the pandas.Index for the input data.pass_row_indexpasses a named parameterrow_indexto the mocked function that is the 0 indexed row number the function is currently operating on.pass_input_datapasses a named parameterinput_datato the mocked function that contains the entire input dataframe for the current expression.- Added support for the
column_orderparameter to methodDataFrameWriter.save_as_table.
Bug Fixes
- Fixed a bug that caused DecimalType columns to be incorrectly truncated to integer precision when used in BinaryExpressions.
Snowpark pandas API Updates
New Features
- Added support for
DataFrameGroupBy.all,SeriesGroupBy.all,DataFrameGroupBy.any, andSeriesGroupBy.any. - Added support for
DataFrame.nlargest,DataFrame.nsmallest,Series.nlargestandSeries.nsmallest. - Added support for
replaceandfrac > 1inDataFrame.sampleandSeries.sample. - Added support for
read_excel(Uses local pandas for processing) - Added support for
Series.at,Series.iat,DataFrame.at, andDataFrame.iat. - Added support for
Series.dt.isocalendar. - Added support for
Series.case_whenexcept when condition or replacement is callable. - Added documentation pages for
Indexand its APIs. - Added support for
DataFrame.assign. - Added support for
DataFrame.stack. - Added support for
DataFrame.pivotandpd.pivot. - Added support for
DataFrame.to_csvandSeries.to_csv. - Added partial support for
Series.str.translatewhere the values in thetableare single-codepoint strings. - Added support for
DataFrame.corr. - Allow
df.plot()andseries.plot()to be called, materializing the data into the local client - Added support for
DataFrameGroupByandSeriesGroupByaggregationsfirstandlast - Added support for
DataFrameGroupBy.get_group. - Added support for
limitparameter whenmethodparameter is used infillna. - Added partial support for
Series.str.translatewhere the values in thetableare single-codepoint strings. - Added support for
DataFrame.corr. - Added support for
DataFrame.equalsandSeries.equals. - Added support for
DataFrame.reindexandSeries.reindex. - Added support for
Index.astype. - Added support for
Index.uniqueandIndex.nunique.
Bug Fixes
- Fixed an issue when using np.where and df.where when the scalar 'other' is the literal 0.
- Fixed a bug regarding precision loss when converting to Snowpark pandas
DataFrameorSerieswithdtype=np.uint64. - Fixed bug where
valuesis set toindexwhenindexandcolumnscontain all columns in DataFrame duringpivot_table.
Improvements
- Added support for
Index.copy() - Added support for Index APIs:
dtype,values,item(),tolist(),to_series()andto_frame() - Expand support for DataFrames with no rows in
pd.pivot_tableandDataFrame.pivot_table. - Added support for
inplaceparameter inDataFrame.sort_indexandSeries.sort_index.
Release
1.19.0 (2024-06-25)
Snowpark Python API Updates
Improvements
New Features
- Added support for
to_booleanfunction. - Added documentation pages for
Indexand its APIs.
Bug Fixes
- Fixed a bug where python stored procedure with table return type fails when run in a task.
- Fixed a bug where df.dropna fails due to
RecursionError: maximum recursion depth exceededwhen the DataFrame has more than 500 columns. - Fixed a bug where
AsyncJob.result("no_result")doesn't wait for the query to finish execution.
Snowpark Local Testing Updates
New Features
- Added support for the
strictparameter when registering UDFs and Stored Procedures.
Bug Fixes
- Fixed a bug in convert_timezone that made the setting the source_timezone parameter return an error.
- Fixed a bug where creating DataFrame with empty data of type
DateTyperaisesAttributeError. - Fixed a bug that table merge fails when update clause exists but no update takes place.
- Fixed a bug in mock implementation of
to_charthat raisesIndexErrorwhen incoming column has nonconsecutive row index. - Fixed a bug in handling of
CaseExprexpressions that raisesIndexErrorwhen incoming column has nonconsecutive row index. - Fixed a bug in implementation of
Column.likethat raisesIndexErrorwhen incoming column has nonconsecutive row index.
Improvements
- Added support for type coercion in the implementation of DataFrame.replace, DataFrame.dropna and the mock function
iff.
Snowpark pandas API Updates
New Features
- Added partial support for
DataFrame.pct_changeandSeries.pct_changewithout thefreqandlimitparameters. - Added support for
Series.str.get. - Added support for
Series.dt.dayofweek,Series.dt.day_of_week,Series.dt.dayofyear, andSeries.dt.day_of_year. - Added support for
Series.str.__getitem__(Series.str[...]). - Added support for
Series.str.lstripandSeries.str.rstrip. - Added support for
DataFrameGroupby.sizeandSeriesGroupby.size. - Added support for
DataFrame.expandingandSeries.expandingfor aggregationscount,sum,min,max,mean,std, andvarwithaxis=0. - Added support for
DataFrame.rollingandSeries.rollingfor aggregationcountwithaxis=0. - Added support for
Series.str.match. - Added support for
DataFrame.resampleandSeries.resamplefor aggregationsize.
Bug Fixes
- Fixed a bug that causes output of GroupBy.aggregate's columns to be ordered incorrectly.
- Fixed a bug where
DataFrame.describeon a frame with duplicate columns of differing dtypes could cause an error or incorrect results. - Fixed a bug in
DataFrame.rollingandSeries.rollingsowindow=0now throwsNotImplementedErrorinstead ofValueError
Improvements
- Added support for named aggregations in
DataFrame.aggregateandSeries.aggregatewithaxis=0. pd.read_csvreads using the native pandas CSV parser, then uploads data to snowflake using parquet. This enables most of the parameters supported byread_csvincluding date parsing and numeric conversions. Uploading via parquet is roughly twice as fast as uploading via CSV.- Initial work to support an
pd.Indexdirectly in Snowpark pandas. Support forpd.Indexas a first-class component of Snowpark pandas is coming soon. - Added a lazy index constructor and support for
len,shape,size,empty,to_pandas()andnames. Fordf.index, Snowpark pandas creates a lazy index object. - For
df.columns, Snowpark pandas supports a non-lazy version of anIndexsince the data is already stored locally.
Release
1.18.0 (2024-05-28)
Snowpark pandas API Updates
New Features
- Added
DataFrame.cache_resultandSeries.cache_resultmethods for users to persist DataFrames and Series to a temporary table lasting the duration of the session to improve latency of subsequent operations.
Improvements
- Added partial support for
DataFrame.pivot_tablewith noindexparameter, as well as formarginsparameter. - Updated the signature of
DataFrame.shift/Series.shift/DataFrameGroupBy.shift/SeriesGroupBy.shiftto match pandas 2.2.1. Snowpark pandas does not yet support the newly-addedsuffixargument, or sequence values ofperiods. - Re-added support for
Series.str.split.
Bug Fixes
- Fixed how we support mixed columns for string methods (
Series.str.*).
Snowpark Local Testing Updates
New Features
- Added support for the following DataFrameReader read options to file formats
csvandjson:- PURGE
- PATTERN
- INFER_SCHEMA with value being
False - ENCODING with value being
UTF8
- Added support for
DataFrame.analytics.moving_aggandDataFrame.analytics.cumulative_agg_agg. - Added support for
if_not_existsparameter during UDF and stored procedure registration.
Bug Fixes
- Fixed a bug that when processing time format, fractional second part is not handled properly.
- Fixed a bug that caused function calls on
*to fail. - Fixed a bug that prevented creation of map and struct type objects.
- Fixed a bug that function
date_addwas unable to handle some numeric types. - Fixed a bug that
TimestampTypecasting resulted in incorrect data. - Fixed a bug that caused
DecimalTypedata to have incorrect precision in some cases. - Fixed a bug where referencing missing table or view raises confusing
IndexError. - Fixed a bug that mocked function
to_timestamp_ntzcan not handle None data. - Fixed a bug that mocked UDFs handles output data of None improperly.
- Fixed a bug where
DataFrame.with_column_renamedignores attributes from parent DataFrames after join operations. - Fixed a bug that integer precision of large value gets lost when converted to pandas DataFrame.
- Fixed a bug that the schema of datetime object is wrong when create DataFrame from a pandas DataFrame.
- Fixed a bug in the implementation of
Column.equal_nanwhere null data is handled incorrectly. - Fixed a bug where
DataFrame.dropignore attributes from parent DataFrames after join operations. - Fixed a bug in mocked function
date_partwhere Column type is set wrong. - Fixed a bug where
DataFrameWriter.save_as_tabledoes not raise exceptions when inserting null data into non-nullable columns. - Fixed a bug in the implementation of
DataFrameWriter.save_as_tablewhere- Append or Truncate fails when incoming data has different schema than existing table.
- Truncate fails when incoming data does not specify columns that are nullable.
Improvements
- Removed dependency check for
pyarrowas it is not used. - Improved target type coverage of
Column.cast, adding support for casting to boolean and all integral types. - Aligned error experience when calling UDFs and stored procedures.
- Added appropriate error messages for
is_permanentandanonymousoptions in UDFs and stored procedures registration to make it more clear that those features are not yet supported. - File read operation with unsupported options and values now raises
NotImplementedErrorinstead of warnings and unclear error information.
Release
1.17.0 (2024-05-21)
Snowpark Python API Updates
New Features
- Added support to add a comment on tables and views using the functions listed below:
DataFrameWriter.save_as_tableDataFrame.create_or_replace_viewDataFrame.create_or_replace_temp_viewDataFrame.create_or_replace_dynamic_table
Improvements
- Improved error message to remind users to set
{"infer_schema": True}when reading CSV file without specifying its schema.
Snowpark pandas API Updates
New Features
- Start of Public Preview of Snowpark pandas API. Refer to the Snowpark pandas API Docs for more details.
Snowpark Local Testing Updates
New Features
- Added support for NumericType and VariantType data conversion in the mocked function
to_timestamp_ltz,to_timestamp_ntz,to_timestamp_tzandto_timestamp. - Added support for DecimalType, BinaryType, ArrayType, MapType, TimestampType, DateType and TimeType data conversion in the mocked function
to_char. - Added support for the following APIs:
- snowflake.snowpark.functions:
- to_varchar
- snowflake.snowpark.DataFrame:
- pivot
- snowflake.snowpark.Session:
- cancel_all
- snowflake.snowpark.functions:
- Introduced a new exception class
snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException. - Added support for casting to FloatType
Bug Fixes
- Fixed a bug that stored procedure and UDF should not remove imports already in the
sys.pathduring the clean-up step. - Fixed a bug that when processing datetime format, the fractional second part is not handled properly.
- Fixed a bug that on Windows platform that file operations was unable to properly handle file separator in directory name.
- Fixed a bug that on Windows platform that when reading a pandas dataframe, IntervalType column with integer data can not be processed.
- Fixed a bug that prevented users from being able to select multiple columns with the same alias.
- Fixed a bug that
Session.get_current_[schema|database|role|user|account|warehouse]returns upper-cased identifiers when identifiers are quoted. - Fixed a bug that function
substrandsubstringcan not handle 0-basedstart_expr.
Improvements
- Standardized the error experience by raising
SnowparkLocalTestingExceptionin error cases which is on par withSnowparkSQLExceptionraised in non-local execution. - Improved error experience of
Session.write_pandasmethod thatNotImplementErrorwill be raised when called. - Aligned error experience with reusing a closed session in non-local execution.