Releases: snowflakedb/snowpark-python
Releases · snowflakedb/snowpark-python
Release
Release
1.41.0 (2025-10-23)
Snowpark Python API Updates
New Features
- Added a new function
serviceinsnowflake.snowpark.functionsthat allows users to create a callable representing a Snowpark Container Services (SPCS) service. - Added
connection_parametersparameter toDataFrameReader.dbapi()(PuPr) method to allow passing keyword arguments to thecreate_connectioncallable. - Added support for
Session.begin_transaction,Session.commitandSession.rollback. - Added support for the following functions in
functions.py:- Geospatial functions:
st_interpolatest_intersectionst_intersection_aggst_intersectsst_isvalidst_lengthst_makegeompointst_makelinest_makepolygonst_makepolygonorientedst_disjointst_distancest_dwithinst_endpointst_envelopest_geohashst_geomfromgeohashst_geompointfromgeohashst_hausdorffdistancest_makepointst_npointsst_perimeterst_pointnst_setsridst_simplifyst_sridst_startpointst_symdifferencest_transformst_unionst_union_aggst_withinst_xst_xmaxst_xminst_yst_ymaxst_yminst_geogfromgeohashst_geogpointfromgeohashst_geographyfromwkbst_geographyfromwktst_geometryfromwkbst_geometryfromwkttry_to_geographytry_to_geometry
- Geospatial functions:
- Added a parameter to enable and disable automatic column name aliasing for
interval_day_time_from_partsandinterval_year_month_from_partsfunctions.
Bug Fixes
- Fixed a bug that
DataFrameReader.xmlfails to parse XML files with undeclared namespaces whenignoreNamespaceisTrue. - Added a fix for floating point precision discrepancies in
interval_day_time_from_parts. - Fixed a bug where writing Snowpark pandas dataframes on the pandas backend with a column multiindex to Snowflake with
to_snowflakewould raiseKeyError. - Fixed a bug that
DataFrameReader.dbapi(PuPr) is not compatible with oracledb 3.4.0. - Fixed a bug where
modinwould unintentionally be imported during session initialization in some scenarios. - Fixed a bug where
session.udf|udtf|udaf|sproc.registerfailed when an extra session argument was passed. These methods do not expect a session argument; please remove it if provided.
Improvements
- The default maximum length for inferred StringType columns during schema inference in
DataFrameReader.dbapiis now increased from 16MB to 128MB in parquet file based ingestion.
Dependency Updates
- Updated dependency of
snowflake-connector-python>=3.17,<5.0.0.
Snowpark pandas API Updates
New Features
- Added support for the
dtypesparameter ofpd.get_dummies - Added support for
nuniqueindf.pivot_table,df.aggand other places where aggregate functions can be used. - Added support for
DataFrame.interpolateandSeries.interpolatewith the "linear", "ffill"/"pad", and "backfill"/bfill" methods. These use the SQLINTERPOLATE_LINEAR,INTERPOLATE_FFILL, andINTERPOLATE_BFILLfunctions (PuPr).
Improvements
- Improved performance of
Series.to_snowflakeandpd.to_snowflake(series)for large data by uploading data via a parquet file. You can control the dataset size at which Snowpark pandas switches to parquet with the variablemodin.config.PandasToSnowflakeParquetThresholdBytes. - Enhanced autoswitching functionality from Snowflake to native Pandas for methods with unsupported argument combinations:
get_dummies()withdummy_na=True,drop_first=True, or customdtypeparameterscumsum(),cummin(),cummax()withaxis=1(column-wise operations)skew()withaxis=1ornumeric_only=Falseparametersround()withdecimalsparameter as a Seriescorr()withmethod!=pearsonparameter
- Set
cte_optimization_enabledto True for all Snowpark pandas sessions. - Add support for the following in faster pandas:
isinisnaisnullnotnanotnullstr.containsstr.startswithstr.endswithstr.slicedt.datedt.timedt.hourdt.minutedt.seconddt.microseconddt.nanoseconddt.yeardt.monthdt.daydt.quarterdt.is_month_startdt.is_month_enddt.is_quarter_startdt.is_quarter_enddt.is_year_startdt.is_year_enddt.is_leap_yeardt.days_in_monthdt.daysinmonthsort_valuesloc(setting columns)to_datetimerenamedropinvertduplicatedilocheadcolumns(e.g., df.columns = ["A", "B"])aggminmaxcountsummeanmedianstdvargroupby.agggroupby.mingroupby.maxgroupby.countgroupby.sumgroupby.meangroupby.mediangroupby.stdgroupby.vardrop_duplicates
- Reuse row count from the relaxed query compiler in
get_axis_len.
Bug Fixes
- Fixed a bug where the row count was not getting cached in the ordered dataframe each time count_rows() is called.
Release
1.40.0 (2025-10-02)
Snowpark Python API Updates
New Features
-
Added a new module
snowflake.snowpark.secretsthat provides Python wrappers for accessing Snowflake Secrets within Python UDFs and stored procedures that execute inside Snowflake.get_generic_secret_stringget_oauth_access_tokenget_secret_typeget_username_passwordget_cloud_provider_token
-
Added support for the following scalar functions in
functions.py:-
Conditional expression functions:
boolandboolnotboolorboolxorboolor_aggdecodegreatest_ignore_nullsleast_ignore_nullsnullifnvl2regr_valx
-
Semi-structured and structured date functions:
array_remove_atas_booleanmap_deletemap_insertmap_pickmap_size
-
String & binary functions:
chrhex_decode_binary
-
Numeric functions:
div0null
-
Differential privacy functions:
dp_interval_highdp_interval_low
-
Context functions:
last_query_idlast_transaction
-
Geospatial functions:
h3_cell_to_boundaryh3_cell_to_childrenh3_cell_to_children_stringh3_cell_to_parenth3_cell_to_pointh3_compact_cellsh3_compact_cells_stringsh3_coverageh3_coverage_stringsh3_get_resolutionh3_grid_diskh3_grid_distanceh3_int_to_stringh3_polygon_to_cellsh3_polygon_to_cells_stringsh3_string_to_inth3_try_grid_pathh3_try_polygon_to_cellsh3_try_polygon_to_cells_stringsh3_uncompact_cellsh3_uncompact_cells_stringshaversineh3_grid_pathh3_is_pentagonh3_is_valid_cellh3_latlng_to_cellh3_latlng_to_cell_stringh3_point_to_cellh3_point_to_cell_stringh3_try_coverageh3_try_coverage_stringsh3_try_grid_distancest_areast_asewkbst_asewktst_asgeojsonst_aswkbst_aswktst_azimuthst_bufferst_centroidst_collectst_containsst_coveredbyst_coversst_differencest_dimension
-
Bug Fixes
- Fixed a bug that
DataFrame.limit()fail if there is parameter binding in the executed SQL when used in non-stored-procedure/udxf environment. - Added an experimental fix for a bug in schema query generation that could cause invalid sql to be generated when using nested structured types.
- Fixed multiple bugs in
DataFrameReader.dbapi(PuPr):- Fixed UDTF ingestion failure with
pyodbcdriver caused by unprocessed row data. - Fixed SQL Server query input failure due to incorrect select query generation.
- Fixed UDTF ingestion not preserving column nullability in the output schema.
- Fixed an issue that caused the program to hang during multithreaded Parquet based ingestion when a data fetching error occurred.
- Fixed a bug in schema parsing when custom schema strings used upper-cased data type names (NUMERIC, NUMBER, DECIMAL, VARCHAR, STRING, TEXT).
- Fixed UDTF ingestion failure with
- Fixed a bug in
Session.create_dataframewhere schema string parsing failed when using upper-cased data type names (e.g., NUMERIC, NUMBER, DECIMAL, VARCHAR, STRING, TEXT).
Improvements
- Improved
DataFrameReader.dbapi(PuPr) that dbapi will not retry on non-retryable error such as SQL syntax error on external data source query. - Removed unnecessary warnings about local package version mismatch when using
session.read.option('rowTag', <tag_name>).xml(<stage_file_path>)orxpathfunctions. - Improved
DataFrameReader.dbapi(PuPr) reading performance by setting the defaultfetch_sizeparameter value to 100000. - Improved error message for XSD validation failure when reading XML files using
session.read.option('rowValidationXSDPath', <xsd_path>).xml(<stage_file_path>).
Snowpark pandas API Updates
Dependency Updates
- Updated the supported
modinversions to >=0.36.0 and <0.38.0 (was previously >= 0.35.0 and <0.37.0).
New Features
- Added support for
DataFrame.queryfor dataframes with single-level indexes. - Added support for
DataFrameGroupby.__len__andSeriesGroupBy.__len__.
Improvements
- Hybrid execution mode is now enabled by default. Certain operations on smaller data will now automatically execute in native pandas in-memory. Use
from modin.config import AutoSwitchBackend; AutoSwitchBackend.disable()to turn this off and force all execution to occur in Snowflake. - Added a session parameter
pandas_hybrid_execution_enabledto enable/disable hybrid execution as an alternative to usingAutoSwitchBackend. - Removed an unnecessary
SHOW OBJECTSquery issued fromread_snowflakeunder certain conditions. - When hybrid execution is enabled,
pd.merge,pd.concat,DataFrame.merge, andDataFrame.joinmay now move arguments to backends other than those among the function arguments. - Improved performance of
DataFrame.to_snowflakeandpd.to_snowflake(dataframe)for large data by uploading data via a parquet file. You can control the dataset size at which Snowpark pandas switches to parquet with the variablemodin.config.PandasToSnowflakeParquetThresholdBytes.
Release
1.39.1 (2024-09-25)
Snowpark Python API Updates
Bug Fixes
- Added an experimental fix for a bug in schema query generation that could cause invalid sql to be genrated when using nested structured types.
Release
1.39.0 (2025-09-17)
Snowpark Python API Updates
New Features
- Added support for unstructured data engineering in Snowpark, powered by Snowflake AISQL and Cortex functions:
DataFrame.ai.complete: Generate per-row LLM completions from prompts built over columns and files.DataFrame.ai.filter: Keep rows where an AI classifier returns TRUE for the given predicate.DataFrame.ai.agg: Reduce a text column into one result using a natural-language task description.RelationalGroupedDataFrame.ai_agg: Perform the same natural-language aggregation per group.DataFrame.ai.classify: Assign single or multiple labels from given categories to text or images.DataFrame.ai.similarity: Compute cosine-based similarity scores between two columns via embeddings.DataFrame.ai.sentiment: Extract overall and aspect-level sentiment from text into JSON.DataFrame.ai.embed: Generate VECTOR embeddings for text or images using configurable models.DataFrame.ai.summarize_agg: Aggregate and produce a single comprehensive summary over many rows.DataFrame.ai.transcribe: Transcribe audio files to text with optional timestamps and speaker labels.DataFrame.ai.parse_document: OCR/layout-parse documents or images into structured JSON.DataFrame.ai.extract: Pull structured fields from text or files using a response schema.DataFrame.ai.count_tokens: Estimate token usage for a given model and input text per row.DataFrame.ai.split_text_markdown_header: Split Markdown into hierarchical header-aware chunks.DataFrame.ai.split_text_recursive_character: Split text into size-bounded chunks using recursive separators.DataFrameReader.file: Create a DataFrame containing all files from a stage as FILE data type for downstream unstructured data processing.
- Added a new datatype
YearMonthIntervalTypethat allows users to create intervals for datetime operations. - Added a new function
interval_year_month_from_partsthat allows users to easily createYearMonthIntervalTypewithout using SQL. - Added a new datatype
DayTimeIntervalTypethat allows users to create intervals for datetime operations. - Added a new function
interval_day_time_from_partsthat allows users to easily createDayTimeIntervalTypewithout using SQL. - Added support for
FileOperation.listto list files in a stage with metadata. - Added support for
FileOperation.removeto remove files in a stage. - Added an option to specify
copy_grantsfor the followingDataFrameAPIs:create_or_replace_viewcreate_or_replace_temp_viewcreate_or_replace_dynamic_table
- Added a new function
snowflake.snowpark.functions.vectorizedthat allows users to mark a function as vectorized UDF. - Added support for parameter
use_vectorized_scannerin functionSession.write_pandas(). - Added support for the following scalar functions in
functions.py:getdategetvariableinvoker_roleinvoker_shareis_application_role_in_sessionis_database_role_in_sessionis_granted_to_invoker_roleis_role_in_sessionlocaltimesystimestamp
Bug Fixes
Deprecations
- Deprecated warnings will be triggered when using snowpark-python with Python 3.9. For more details, please refer to https://docs.snowflake.com/en/developer-guide/python-runtime-support-policy.
Dependency Updates
Improvements
- Unsupported types in
DataFrameReader.dbapi(PuPr) are ingested asStringTypenow. - Improved error message to list available columns when dataframe cannot resolve given column name.
- Added a new option
cacheResulttoDataFrameReader.xmlthat allows users to cache the result of the XML reader to a temporary table after callingxml. It helps improve performance when subsequent operations are performed on the same DataFrame.
Snowpark pandas API Updates
New Features
Improvements
- Downgraded to level
logging.DEBUG - 1the log message saying that the
SnowparkDataFramereference of an internalDataFrameReferenceobject
has changed. - Eliminate duplicate parameter check queries for casing status when retrieving the session.
- Retrieve dataframe row counts through object metadata to avoid a COUNT(*) query (performance)
- Added support for applying Snowflake Cortex function
Complete. - Introduce faster pandas: Improved performance by deferring row position computation.
- The following operations are currently supported and can benefit from the optimization:
read_snowflake,repr,loc,reset_index,merge, and binary operations. - If a lazy object (e.g., DataFrame or Series) depends on a mix of supported and unsupported operations, the optimization will not be used.
- The following operations are currently supported and can benefit from the optimization:
- Updated the error message for when Snowpark pandas is referenced within apply.
- Added a session parameter
dummy_row_pos_optimization_enabledto enable/disable dummy row position optimization in faster pandas.
Dependency Updates
- Updated the supported
modinversions to >=0.35.0 and <0.37.0 (was previously >= 0.34.0 and <0.36.0).
Bug Fixes
- Fixed an issue with drop_duplicates where the same data source could be read multiple times in the same query but in a different order each time, resulting in missing rows in the final result. The fix ensures that the data source is read only once.
- Fixed a bug with hybrid execution mode where an
AssertionErrorwas unexpectedly raised by certain indexing operations.
Snowpark Local Testing Updates
New Features
- Added support to allow patching
functions.ai_complete.
Release
1.38.0 (2025-09-04)
Snowpark Python API Updates
New Features
- Added support for the following AI-powered functions in
functions.py:ai_extractai_parse_documentai_transcribe
- Added time travel support for querying historical data:
Session.table()now supports time travel parameters:time_travel_mode,statement,offset,timestamp,timestamp_type, andstream.DataFrameReader.table()supports the same time travel parameters as direct arguments.DataFrameReadersupports time travel via option chaining (e.g.,session.read.option("time_travel_mode", "at").option("offset", -60).table("my_table")).
- Added support for specifying the following parameters to
DataFrameWriter.copy_into_locationfor validation and writing data to external locations:validation_modestorage_integrationcredentialsencryption
- Added support for
Session.directoryandSession.read.directoryto retrieve the list of all files on a stage with metadata. - Added support for
DataFrameReader.jdbc(PrPr) that allows ingesting external data source with jdbc driver. - Added support for
FileOperation.copy_filesto copy files from a source location to an output stage. - Added support for the following scalar functions in
functions.py:all_user_namesbitandbitand_aggbitorbitor_aggbitxorbitxor_aggcurrent_account_namecurrent_clientcurrent_ip_addresscurrent_role_typecurrent_organization_namecurrent_organization_usercurrent_secondary_rolescurrent_transactiongetbit
Bug Fixes
- Fixed the repr of TimestampType to match the actual subtype it represents.
- Fixed a bug in
DataFrameReader.dbapithat udtf ingestion does not work in stored procedure. - Fixed a bug in schema inference that caused incorrect stage prefixes to be used.
Improvements
- Enhanced error handling in
DataFrameReader.dbapithread-based ingestion to prevent unnecessary operations, which improves resource efficiency. - Bumped cloudpickle dependency to also support
cloudpickle==3.1.1in addition to previous versions. - Improved
DataFrameReader.dbapi(PuPr) ingestion performance for PostgreSQL and MySQL by using server side cursor to fetch data.
Snowpark pandas API Updates
New Features
- Completed support for
pd.read_snowflake(),pd.to_iceberg(),
pd.to_pandas(),pd.to_snowpark(),pd.to_snowflake(),
DataFrame.to_iceberg(),DataFrame.to_pandas(),DataFrame.to_snowpark(),
DataFrame.to_snowflake(),Series.to_iceberg(),Series.to_pandas(),
Series.to_snowpark(), andSeries.to_snowflake()on the "Pandas" and "Ray"
backends. Previously, only some of these functions and methods were supported
on the Pandas backend. - Added support for
Index.get_level_values().
Improvements
- Set the default transfer limit in hybrid execution for data leaving Snowflake to 100k, which can be overridden with the SnowflakePandasTransferThreshold environment variable. This configuration is appropriate for scenarios with two available engines, "Pandas" and "Snowflake" on relational workloads.
- Improve import error message by adding
--upgradetopip install "snowflake-snowpark-python[modin]"in the error message. - Reduce the telemetry messages from the modin client by pre-aggregating into 5 second windows and only keeping a narrow band of metrics which are useful for tracking hybrid execution and native pandas performance.
- Set the initial row count only when hybrid execution is enabled. This reduces the number of queries issued for many workloads.
- Add a new test parameter for integration tests to enable hybrid execution.
Bug Fixes
- Raised
NotImplementedErrorinstead ofAttributeErroron attempting to call
Snowflake extension functions/methodsto_dynamic_table(),cache_result(),
to_view(),create_or_replace_dynamic_table(), and
create_or_replace_view()on dataframes or series using the pandas or ray
backends.
Release
1.37.0 (2025-08-18)
Snowpark Python API Updates
New Features
- Added support for the following
xpathfunctions infunctions.py:xpathxpath_stringxpath_booleanxpath_intxpath_floatxpath_doublexpath_longxpath_short
- Added support for parameter
use_vectorized_scannerin functionSession.write_arrow(). - Dataframe profiler adds the following information about each query: describe query time, execution time, and sql query text. To view this information, call session.dataframe_profiler.enable() and call get_execution_profile on a dataframe.
- Added support for
DataFrame.col_ilike. - Added support for non-blocking stored procedure calls that return
AsyncJobobjects.- Added
block: bool = Trueparameter toSession.call(). Whenblock=False, returns anAsyncJobinstead of blocking until completion. - Added
block: bool = Trueparameter toStoredProcedure.__call__()for async support across both named and anonymous stored procedures. - Added
Session.call_nowait()that is equivalent toSession.call(block=False).
- Added
Bug Fixes
- Fixed a bug in CTE optimization stage where
deepcopyof internal plans would cause a memory spike when a dataframe is created locally usingsession.create_dataframe()using a large input data. - Fixed a bug in
DataFrameReader.parquetwhere theignore_caseoption in theinfer_schema_optionswas not respected. - Fixed a bug that
to_pandas()has different format of column name when query result format is set to 'JSON' and 'ARROW'.
Deprecations
- Deprecated
pkg_resources.
Dependency Updates
- Added a dependency on
protobuf<6.32
Snowpark pandas API Updates
New Features
- Added support for efficient transfer of data between Snowflake and Ray with the
DataFrame.set_backendmethod. The installed version ofmodinmust be at least 0.35.0, andraymust be installed.
Improvements
Dependency Updates
- Updated the supported
modinversions to >=0.34.0 and <0.36.0 (was previously >= 0.33.0 and <0.35.0). - Added support for pandas 2.3 when the installed
modinversion is at least 0.35.0.
Bug Fixes
- Fixed an issue in hybrid execution mode (PrPr) where
pd.to_datetimeandpd.to_timedeltawould unexpectedly raiseIndexError. - Fixed a bug where
pd.explain_switchwould raiseIndexErroror returnNoneif called before any potential switch operations were performed.
Release
1.36.0 (2025-08-05)
Snowpark Python API Updates
New Features
Session.create_dataframenow accepts keyword arguments that are forwarded to the internal call toSession.write_pandasorSession.write_arrowwhen creating a DataFrame from a pandas DataFrame or a pyarrow Table.- Added new APIs for
AsyncJob:AsyncJob.is_failed()returns aboolindicating if a job has failed. Can be used in combination withAsyncJob.is_done()to determine if a job is finished and errored.AsyncJob.status()returns a string representing the current query status (e.g., "RUNNING", "SUCCESS", "FAILED_WITH_ERROR") for detailed monitoring without callingresult().
- Added a dataframe profiler. To use, you can call get_execution_profile() on your desired dataframe. This profiler reports the queries executed to evaluate a dataframe, and statistics about each of the query operators. Currently an experimental feature
- Added support for the following functions in
functions.py:ai_sentiment
- Updated the interface for experimental feature
context.configure_development_features. All development features are disabled by default unless explicitly enabled by the user.
Snowpark pandas API Updates
New Features
Improvements
- Hybrid execution row estimate improvements and a reduction of eager calls.
- Add a new configuration variable to control transfer costs out of Snowflake when using hybrid execution.
- Added support for creating permanent and immutable UDFs/UDTFs with
DataFrame/Series/GroupBy.apply,map, andtransformby passing thesnowflake_udf_paramskeyword argument. See documentation for details.
Bug Fixes
- Fixed an issue where Snowpark pandas plugin would unconditionally disable
AutoSwitchBackendeven when users had explicitly configured it via environment variables or programmatically.
Release
1.35.0 (2025-07-24)
Snowpark Python API Updates
New Features
- Added support for the following functions in
functions.py:ai_embedtry_parse_json
Bug Fixes
- Fixed a bug in
DataFrameReader.dbapi(PrPr) thatdbapifail in python stored procedure with process exit with code 1. - Fixed a bug in
DataFrameReader.dbapi(PrPr) thatcustom_schemaaccept illegal schema. - Fixed a bug in
DataFrameReader.dbapi(PrPr) thatcustom_schemadoes not work when connecting to Postgres and Mysql. - Fixed a bug in schema inference that would cause it to fail for external stages.
Improvements
- Improved
queryparameter inDataFrameReader.dbapi(PrPr) so that parentheses are not needed around the query. - Improved error experience in
DataFrameReader.dbapi(PrPr) when exception happen during inferring schema of target data source.
Snowpark Local Testing Updates
New Features
- Added local testing support for reading files with
SnowflakeFileusing local file paths, the Snow URL semantic (snow://...), local testing framework stages, and Snowflake stages (@stage/file_path).
Snowpark pandas API Updates
New Features
- Added support for
DataFrame.boxplot.
Improvements
- Reduced the number of UDFs/UDTFs created by repeated calls to
applyormapwith the same arguments on Snowpark pandas objects.
Bug Fixes
- Added an upper bound to the row estimation when the cartesian product from an align or join results in a very large number. This mitigates a performance regression.
- Fix a
pd.read_excelbug when reading files inside stage inner directory.
Release
1.34.0 (2025-07-15)
Snowpark Python API Updates
New Features
- Added a new option
TRY_CASTtoDataFrameReader. WhenTRY_CASTis True columns are wrapped in aTRY_CASTstatement rather than a hard cast when loading data. - Added a new option
USE_RELAXED_TYPESto theINFER_SCHEMA_OPTIONSofDataFrameReader. When set to True this option casts all strings to max length strings and all numeric types toDoubleType. - Added debuggability improvements to eagerly validate dataframe schema metadata. Enable it using
snowflake.snowpark.context.configure_development_features(). - Added a new function
snowflake.snowpark.dataframe.map_in_pandasthat allows users map a function across a dataframe. The mapping function takes an iterator of pandas dataframes as input and provides one as output. - Added a ttl cache to describe queries. Repeated queries in a 15 second interval will use the cached value rather than requery Snowflake.
- Added a parameter
fetch_with_processtoDataFrameReader.dbapi(PrPr) to enable multiprocessing for parallel data fetching in
local ingestion. By default, local ingestion uses multithreading. Multiprocessing may improve performance for CPU-bound tasks like Parquet file generation. - Added a new function
snowflake.snowpark.functions.modelthat allows users to call methods of a model.
Improvements
- Added support for row validation using XSD schema using
rowValidationXSDPathoption when reading XML files with a row tag usingrowTagoption. - Improved SQL generation for
session.table().sample()to generate a flat SQL statement. - Added support for complex column expression as input for
functions.explode. - Added debuggability improvements to show which Python lines an SQL compilation error corresponds to. Enable it using
snowflake.snowpark.context.configure_development_features(). This feature also depends on AST collection to be enabled in the session which can be done usingsession.ast_enabled = True. - Set enforce_ordering=True when calling
to_snowpark_pandas()from a snowpark dataframe containing DML/DDL queries instead of throwing a NotImplementedError.
Bug Fixes
- Fixed a bug caused by redundant validation when creating an iceberg table.
- Fixed a bug in
DataFrameReader.dbapi(PrPr) where closing the cursor or connection could unexpectedly raise an error and terminate the program. - Fixed ambiguous column errors when using table functions in
DataFrame.select()that have output columns matching the input DataFrame's columns. This improvement works when dataframe columns are provided asColumnobjects. - Fixed a bug where having a NULL in a column with DecimalTypes would cast the column to FloatTypes instead and lead to precision loss.
Snowpark Local Testing Updates
Bug Fixes
- Fixed a bug when processing windowed functions that lead to incorrect indexing in results.
- When a scalar numeric is passed to fillna we will ignore non-numeric columns instead of producing an error.
Snowpark pandas API Updates
New Features
- Added support for
DataFrame.to_excelandSeries.to_excel. - Added support for
pd.read_feather,pd.read_orc, andpd.read_stata. - Added support for
pd.explain_switch()to return debugging information on hybrid execution decisions. - Support
pd.read_snowflakewhen the global modin backend isPandas. - Added support for
pd.to_dynamic_table,pd.to_iceberg, andpd.to_view.
Improvements
- Added modin telemetry on API calls and hybrid engine switches.
- Show more helpful error messages to Snowflake Notebook users when the
modinorpandasversion does not match our requirements. - Added a data type guard to the cost functions for hybrid execution mode (PrPr) which checks for data type compatibility.
- Added automatic switching to the pandas backend in hybrid execution mode (PrPr) for many methods that are not directly implemented in Snowpark pandas.
- Set the 'type' and other standard fields for Snowpark pandas telemetry.
Dependency Updates
- Added tqdm and ipywidgets as dependencies so that progress bars appear when switching between modin backends.
- Updated the supported
modinversions to >=0.33.0 and <0.35.0 (was previously >= 0.32.0 and <0.34.0).
Bug Fixes
- Fixed a bug in hybrid execution mode (PrPr) where certain Series operations would raise
TypeError: numpy.ndarray object is not callable. - Fixed a bug in hybrid execution mode (PrPr) where calling numpy operations like
np.whereon modin objects with the Pandas backend would raise anAttributeError. This fix requiresmodinversion 0.34.0 or newer. - Fixed issue in
df.meltwhere the resulting values have an additional suffix applied.