Skip to content

Releases: snowflakedb/snowpark-python

Release

28 Oct 18:09

Choose a tag to compare

1.42.0 (2025-10-28)

Snowpark Python API Updates

New Features

  • Snowpark python DB-api is now generally available. Access this feature with DataFrameReader.dbapi() to read data from a database table or query into a DataFrame using a DBAPI connection.

Release

23 Oct 20:47

Choose a tag to compare

1.41.0 (2025-10-23)

Snowpark Python API Updates

New Features

  • Added a new function service in snowflake.snowpark.functions that allows users to create a callable representing a Snowpark Container Services (SPCS) service.
  • Added connection_parameters parameter to DataFrameReader.dbapi() (PuPr) method to allow passing keyword arguments to the create_connection callable.
  • Added support for Session.begin_transaction, Session.commit and Session.rollback.
  • Added support for the following functions in functions.py:
    • Geospatial functions:
      • st_interpolate
      • st_intersection
      • st_intersection_agg
      • st_intersects
      • st_isvalid
      • st_length
      • st_makegeompoint
      • st_makeline
      • st_makepolygon
      • st_makepolygonoriented
      • st_disjoint
      • st_distance
      • st_dwithin
      • st_endpoint
      • st_envelope
      • st_geohash
      • st_geomfromgeohash
      • st_geompointfromgeohash
      • st_hausdorffdistance
      • st_makepoint
      • st_npoints
      • st_perimeter
      • st_pointn
      • st_setsrid
      • st_simplify
      • st_srid
      • st_startpoint
      • st_symdifference
      • st_transform
      • st_union
      • st_union_agg
      • st_within
      • st_x
      • st_xmax
      • st_xmin
      • st_y
      • st_ymax
      • st_ymin
      • st_geogfromgeohash
      • st_geogpointfromgeohash
      • st_geographyfromwkb
      • st_geographyfromwkt
      • st_geometryfromwkb
      • st_geometryfromwkt
      • try_to_geography
      • try_to_geometry
  • Added a parameter to enable and disable automatic column name aliasing for interval_day_time_from_parts and interval_year_month_from_parts functions.

Bug Fixes

  • Fixed a bug that DataFrameReader.xml fails to parse XML files with undeclared namespaces when ignoreNamespace is True.
  • Added a fix for floating point precision discrepancies in interval_day_time_from_parts.
  • Fixed a bug where writing Snowpark pandas dataframes on the pandas backend with a column multiindex to Snowflake with to_snowflake would raise KeyError.
  • Fixed a bug that DataFrameReader.dbapi (PuPr) is not compatible with oracledb 3.4.0.
  • Fixed a bug where modin would unintentionally be imported during session initialization in some scenarios.
  • Fixed a bug where session.udf|udtf|udaf|sproc.register failed when an extra session argument was passed. These methods do not expect a session argument; please remove it if provided.

Improvements

  • The default maximum length for inferred StringType columns during schema inference in DataFrameReader.dbapi is now increased from 16MB to 128MB in parquet file based ingestion.

Dependency Updates

  • Updated dependency of snowflake-connector-python>=3.17,<5.0.0.

Snowpark pandas API Updates

New Features

  • Added support for the dtypes parameter of pd.get_dummies
  • Added support for nunique in df.pivot_table, df.agg and other places where aggregate functions can be used.
  • Added support for DataFrame.interpolate and Series.interpolate with the "linear", "ffill"/"pad", and "backfill"/bfill" methods. These use the SQL INTERPOLATE_LINEAR, INTERPOLATE_FFILL, and INTERPOLATE_BFILL functions (PuPr).

Improvements

  • Improved performance of Series.to_snowflake and pd.to_snowflake(series) for large data by uploading data via a parquet file. You can control the dataset size at which Snowpark pandas switches to parquet with the variable modin.config.PandasToSnowflakeParquetThresholdBytes.
  • Enhanced autoswitching functionality from Snowflake to native Pandas for methods with unsupported argument combinations:
    • get_dummies() with dummy_na=True, drop_first=True, or custom dtype parameters
    • cumsum(), cummin(), cummax() with axis=1 (column-wise operations)
    • skew() with axis=1 or numeric_only=False parameters
    • round() with decimals parameter as a Series
    • corr() with method!=pearson parameter
  • Set cte_optimization_enabled to True for all Snowpark pandas sessions.
  • Add support for the following in faster pandas:
    • isin
    • isna
    • isnull
    • notna
    • notnull
    • str.contains
    • str.startswith
    • str.endswith
    • str.slice
    • dt.date
    • dt.time
    • dt.hour
    • dt.minute
    • dt.second
    • dt.microsecond
    • dt.nanosecond
    • dt.year
    • dt.month
    • dt.day
    • dt.quarter
    • dt.is_month_start
    • dt.is_month_end
    • dt.is_quarter_start
    • dt.is_quarter_end
    • dt.is_year_start
    • dt.is_year_end
    • dt.is_leap_year
    • dt.days_in_month
    • dt.daysinmonth
    • sort_values
    • loc (setting columns)
    • to_datetime
    • rename
    • drop
    • invert
    • duplicated
    • iloc
    • head
    • columns (e.g., df.columns = ["A", "B"])
    • agg
    • min
    • max
    • count
    • sum
    • mean
    • median
    • std
    • var
    • groupby.agg
    • groupby.min
    • groupby.max
    • groupby.count
    • groupby.sum
    • groupby.mean
    • groupby.median
    • groupby.std
    • groupby.var
    • drop_duplicates
  • Reuse row count from the relaxed query compiler in get_axis_len.

Bug Fixes

  • Fixed a bug where the row count was not getting cached in the ordered dataframe each time count_rows() is called.

Release

06 Oct 15:59

Choose a tag to compare

1.40.0 (2025-10-02)

Snowpark Python API Updates

New Features

  • Added a new module snowflake.snowpark.secrets that provides Python wrappers for accessing Snowflake Secrets within Python UDFs and stored procedures that execute inside Snowflake.

    • get_generic_secret_string
    • get_oauth_access_token
    • get_secret_type
    • get_username_password
    • get_cloud_provider_token
  • Added support for the following scalar functions in functions.py:

    • Conditional expression functions:

      • booland
      • boolnot
      • boolor
      • boolxor
      • boolor_agg
      • decode
      • greatest_ignore_nulls
      • least_ignore_nulls
      • nullif
      • nvl2
      • regr_valx
    • Semi-structured and structured date functions:

      • array_remove_at
      • as_boolean
      • map_delete
      • map_insert
      • map_pick
      • map_size
    • String & binary functions:

      • chr
      • hex_decode_binary
    • Numeric functions:

      • div0null
    • Differential privacy functions:

      • dp_interval_high
      • dp_interval_low
    • Context functions:

      • last_query_id
      • last_transaction
    • Geospatial functions:

      • h3_cell_to_boundary
      • h3_cell_to_children
      • h3_cell_to_children_string
      • h3_cell_to_parent
      • h3_cell_to_point
      • h3_compact_cells
      • h3_compact_cells_strings
      • h3_coverage
      • h3_coverage_strings
      • h3_get_resolution
      • h3_grid_disk
      • h3_grid_distance
      • h3_int_to_string
      • h3_polygon_to_cells
      • h3_polygon_to_cells_strings
      • h3_string_to_int
      • h3_try_grid_path
      • h3_try_polygon_to_cells
      • h3_try_polygon_to_cells_strings
      • h3_uncompact_cells
      • h3_uncompact_cells_strings
      • haversine
      • h3_grid_path
      • h3_is_pentagon
      • h3_is_valid_cell
      • h3_latlng_to_cell
      • h3_latlng_to_cell_string
      • h3_point_to_cell
      • h3_point_to_cell_string
      • h3_try_coverage
      • h3_try_coverage_strings
      • h3_try_grid_distance
      • st_area
      • st_asewkb
      • st_asewkt
      • st_asgeojson
      • st_aswkb
      • st_aswkt
      • st_azimuth
      • st_buffer
      • st_centroid
      • st_collect
      • st_contains
      • st_coveredby
      • st_covers
      • st_difference
      • st_dimension

Bug Fixes

  • Fixed a bug that DataFrame.limit() fail if there is parameter binding in the executed SQL when used in non-stored-procedure/udxf environment.
  • Added an experimental fix for a bug in schema query generation that could cause invalid sql to be generated when using nested structured types.
  • Fixed multiple bugs in DataFrameReader.dbapi (PuPr):
    • Fixed UDTF ingestion failure with pyodbc driver caused by unprocessed row data.
    • Fixed SQL Server query input failure due to incorrect select query generation.
    • Fixed UDTF ingestion not preserving column nullability in the output schema.
    • Fixed an issue that caused the program to hang during multithreaded Parquet based ingestion when a data fetching error occurred.
    • Fixed a bug in schema parsing when custom schema strings used upper-cased data type names (NUMERIC, NUMBER, DECIMAL, VARCHAR, STRING, TEXT).
  • Fixed a bug in Session.create_dataframe where schema string parsing failed when using upper-cased data type names (e.g., NUMERIC, NUMBER, DECIMAL, VARCHAR, STRING, TEXT).

Improvements

  • Improved DataFrameReader.dbapi(PuPr) that dbapi will not retry on non-retryable error such as SQL syntax error on external data source query.
  • Removed unnecessary warnings about local package version mismatch when using session.read.option('rowTag', <tag_name>).xml(<stage_file_path>) or xpath functions.
  • Improved DataFrameReader.dbapi (PuPr) reading performance by setting the default fetch_size parameter value to 100000.
  • Improved error message for XSD validation failure when reading XML files using session.read.option('rowValidationXSDPath', <xsd_path>).xml(<stage_file_path>).

Snowpark pandas API Updates

Dependency Updates

  • Updated the supported modin versions to >=0.36.0 and <0.38.0 (was previously >= 0.35.0 and <0.37.0).

New Features

  • Added support for DataFrame.query for dataframes with single-level indexes.
  • Added support for DataFrameGroupby.__len__ and SeriesGroupBy.__len__.

Improvements

  • Hybrid execution mode is now enabled by default. Certain operations on smaller data will now automatically execute in native pandas in-memory. Use from modin.config import AutoSwitchBackend; AutoSwitchBackend.disable() to turn this off and force all execution to occur in Snowflake.
  • Added a session parameter pandas_hybrid_execution_enabled to enable/disable hybrid execution as an alternative to using AutoSwitchBackend.
  • Removed an unnecessary SHOW OBJECTS query issued from read_snowflake under certain conditions.
  • When hybrid execution is enabled, pd.merge, pd.concat, DataFrame.merge, and DataFrame.join may now move arguments to backends other than those among the function arguments.
  • Improved performance of DataFrame.to_snowflake and pd.to_snowflake(dataframe) for large data by uploading data via a parquet file. You can control the dataset size at which Snowpark pandas switches to parquet with the variable modin.config.PandasToSnowflakeParquetThresholdBytes.

Release

26 Sep 01:04

Choose a tag to compare

1.39.1 (2024-09-25)

Snowpark Python API Updates

Bug Fixes

  • Added an experimental fix for a bug in schema query generation that could cause invalid sql to be genrated when using nested structured types.

Release

17 Sep 17:58

Choose a tag to compare

1.39.0 (2025-09-17)

Snowpark Python API Updates

New Features

  • Added support for unstructured data engineering in Snowpark, powered by Snowflake AISQL and Cortex functions:
    • DataFrame.ai.complete: Generate per-row LLM completions from prompts built over columns and files.
    • DataFrame.ai.filter: Keep rows where an AI classifier returns TRUE for the given predicate.
    • DataFrame.ai.agg: Reduce a text column into one result using a natural-language task description.
    • RelationalGroupedDataFrame.ai_agg: Perform the same natural-language aggregation per group.
    • DataFrame.ai.classify: Assign single or multiple labels from given categories to text or images.
    • DataFrame.ai.similarity: Compute cosine-based similarity scores between two columns via embeddings.
    • DataFrame.ai.sentiment: Extract overall and aspect-level sentiment from text into JSON.
    • DataFrame.ai.embed: Generate VECTOR embeddings for text or images using configurable models.
    • DataFrame.ai.summarize_agg: Aggregate and produce a single comprehensive summary over many rows.
    • DataFrame.ai.transcribe: Transcribe audio files to text with optional timestamps and speaker labels.
    • DataFrame.ai.parse_document: OCR/layout-parse documents or images into structured JSON.
    • DataFrame.ai.extract: Pull structured fields from text or files using a response schema.
    • DataFrame.ai.count_tokens: Estimate token usage for a given model and input text per row.
    • DataFrame.ai.split_text_markdown_header: Split Markdown into hierarchical header-aware chunks.
    • DataFrame.ai.split_text_recursive_character: Split text into size-bounded chunks using recursive separators.
    • DataFrameReader.file: Create a DataFrame containing all files from a stage as FILE data type for downstream unstructured data processing.
  • Added a new datatype YearMonthIntervalType that allows users to create intervals for datetime operations.
  • Added a new function interval_year_month_from_parts that allows users to easily create YearMonthIntervalType without using SQL.
  • Added a new datatype DayTimeIntervalType that allows users to create intervals for datetime operations.
  • Added a new function interval_day_time_from_parts that allows users to easily create DayTimeIntervalType without using SQL.
  • Added support for FileOperation.list to list files in a stage with metadata.
  • Added support for FileOperation.remove to remove files in a stage.
  • Added an option to specify copy_grants for the following DataFrame APIs:
    • create_or_replace_view
    • create_or_replace_temp_view
    • create_or_replace_dynamic_table
  • Added a new function snowflake.snowpark.functions.vectorized that allows users to mark a function as vectorized UDF.
  • Added support for parameter use_vectorized_scanner in function Session.write_pandas().
  • Added support for the following scalar functions in functions.py:
    • getdate
    • getvariable
    • invoker_role
    • invoker_share
    • is_application_role_in_session
    • is_database_role_in_session
    • is_granted_to_invoker_role
    • is_role_in_session
    • localtime
    • systimestamp

Bug Fixes

Deprecations

Dependency Updates

Improvements

  • Unsupported types in DataFrameReader.dbapi(PuPr) are ingested as StringType now.
  • Improved error message to list available columns when dataframe cannot resolve given column name.
  • Added a new option cacheResult to DataFrameReader.xml that allows users to cache the result of the XML reader to a temporary table after calling xml. It helps improve performance when subsequent operations are performed on the same DataFrame.

Snowpark pandas API Updates

New Features

Improvements

  • Downgraded to level logging.DEBUG - 1 the log message saying that the
    Snowpark DataFrame reference of an internal DataFrameReference object
    has changed.
  • Eliminate duplicate parameter check queries for casing status when retrieving the session.
  • Retrieve dataframe row counts through object metadata to avoid a COUNT(*) query (performance)
  • Added support for applying Snowflake Cortex function Complete.
  • Introduce faster pandas: Improved performance by deferring row position computation.
    • The following operations are currently supported and can benefit from the optimization: read_snowflake, repr, loc, reset_index, merge, and binary operations.
    • If a lazy object (e.g., DataFrame or Series) depends on a mix of supported and unsupported operations, the optimization will not be used.
  • Updated the error message for when Snowpark pandas is referenced within apply.
  • Added a session parameter dummy_row_pos_optimization_enabled to enable/disable dummy row position optimization in faster pandas.

Dependency Updates

  • Updated the supported modin versions to >=0.35.0 and <0.37.0 (was previously >= 0.34.0 and <0.36.0).

Bug Fixes

  • Fixed an issue with drop_duplicates where the same data source could be read multiple times in the same query but in a different order each time, resulting in missing rows in the final result. The fix ensures that the data source is read only once.
  • Fixed a bug with hybrid execution mode where an AssertionError was unexpectedly raised by certain indexing operations.

Snowpark Local Testing Updates

New Features

  • Added support to allow patching functions.ai_complete.

Release

04 Sep 23:54

Choose a tag to compare

1.38.0 (2025-09-04)

Snowpark Python API Updates

New Features

  • Added support for the following AI-powered functions in functions.py:
    • ai_extract
    • ai_parse_document
    • ai_transcribe
  • Added time travel support for querying historical data:
    • Session.table() now supports time travel parameters: time_travel_mode, statement, offset, timestamp, timestamp_type, and stream.
    • DataFrameReader.table() supports the same time travel parameters as direct arguments.
    • DataFrameReader supports time travel via option chaining (e.g., session.read.option("time_travel_mode", "at").option("offset", -60).table("my_table")).
  • Added support for specifying the following parameters to DataFrameWriter.copy_into_location for validation and writing data to external locations:
    • validation_mode
    • storage_integration
    • credentials
    • encryption
  • Added support for Session.directory and Session.read.directory to retrieve the list of all files on a stage with metadata.
  • Added support for DataFrameReader.jdbc(PrPr) that allows ingesting external data source with jdbc driver.
  • Added support for FileOperation.copy_files to copy files from a source location to an output stage.
  • Added support for the following scalar functions in functions.py:
    • all_user_names
    • bitand
    • bitand_agg
    • bitor
    • bitor_agg
    • bitxor
    • bitxor_agg
    • current_account_name
    • current_client
    • current_ip_address
    • current_role_type
    • current_organization_name
    • current_organization_user
    • current_secondary_roles
    • current_transaction
    • getbit

Bug Fixes

  • Fixed the repr of TimestampType to match the actual subtype it represents.
  • Fixed a bug in DataFrameReader.dbapi that udtf ingestion does not work in stored procedure.
  • Fixed a bug in schema inference that caused incorrect stage prefixes to be used.

Improvements

  • Enhanced error handling in DataFrameReader.dbapi thread-based ingestion to prevent unnecessary operations, which improves resource efficiency.
  • Bumped cloudpickle dependency to also support cloudpickle==3.1.1 in addition to previous versions.
  • Improved DataFrameReader.dbapi (PuPr) ingestion performance for PostgreSQL and MySQL by using server side cursor to fetch data.

Snowpark pandas API Updates

New Features

  • Completed support for pd.read_snowflake(), pd.to_iceberg(),
    pd.to_pandas(), pd.to_snowpark(), pd.to_snowflake(),
    DataFrame.to_iceberg(), DataFrame.to_pandas(), DataFrame.to_snowpark(),
    DataFrame.to_snowflake(), Series.to_iceberg(), Series.to_pandas(),
    Series.to_snowpark(), and Series.to_snowflake() on the "Pandas" and "Ray"
    backends. Previously, only some of these functions and methods were supported
    on the Pandas backend.
  • Added support for Index.get_level_values().

Improvements

  • Set the default transfer limit in hybrid execution for data leaving Snowflake to 100k, which can be overridden with the SnowflakePandasTransferThreshold environment variable. This configuration is appropriate for scenarios with two available engines, "Pandas" and "Snowflake" on relational workloads.
  • Improve import error message by adding --upgrade to pip install "snowflake-snowpark-python[modin]" in the error message.
  • Reduce the telemetry messages from the modin client by pre-aggregating into 5 second windows and only keeping a narrow band of metrics which are useful for tracking hybrid execution and native pandas performance.
  • Set the initial row count only when hybrid execution is enabled. This reduces the number of queries issued for many workloads.
  • Add a new test parameter for integration tests to enable hybrid execution.

Bug Fixes

  • Raised NotImplementedError instead of AttributeError on attempting to call
    Snowflake extension functions/methods to_dynamic_table(), cache_result(),
    to_view(), create_or_replace_dynamic_table(), and
    create_or_replace_view() on dataframes or series using the pandas or ray
    backends.

Release

18 Aug 21:49

Choose a tag to compare

1.37.0 (2025-08-18)

Snowpark Python API Updates

New Features

  • Added support for the following xpath functions in functions.py:
    • xpath
    • xpath_string
    • xpath_boolean
    • xpath_int
    • xpath_float
    • xpath_double
    • xpath_long
    • xpath_short
  • Added support for parameter use_vectorized_scanner in function Session.write_arrow().
  • Dataframe profiler adds the following information about each query: describe query time, execution time, and sql query text. To view this information, call session.dataframe_profiler.enable() and call get_execution_profile on a dataframe.
  • Added support for DataFrame.col_ilike.
  • Added support for non-blocking stored procedure calls that return AsyncJob objects.
    • Added block: bool = True parameter to Session.call(). When block=False, returns an AsyncJob instead of blocking until completion.
    • Added block: bool = True parameter to StoredProcedure.__call__() for async support across both named and anonymous stored procedures.
    • Added Session.call_nowait() that is equivalent to Session.call(block=False).

Bug Fixes

  • Fixed a bug in CTE optimization stage where deepcopy of internal plans would cause a memory spike when a dataframe is created locally using session.create_dataframe() using a large input data.
  • Fixed a bug in DataFrameReader.parquet where the ignore_case option in the infer_schema_options was not respected.
  • Fixed a bug that to_pandas() has different format of column name when query result format is set to 'JSON' and 'ARROW'.

Deprecations

  • Deprecated pkg_resources.

Dependency Updates

  • Added a dependency on protobuf<6.32

Snowpark pandas API Updates

New Features

  • Added support for efficient transfer of data between Snowflake and Ray with the DataFrame.set_backend method. The installed version of modin must be at least 0.35.0, and ray must be installed.

Improvements

Dependency Updates

  • Updated the supported modin versions to >=0.34.0 and <0.36.0 (was previously >= 0.33.0 and <0.35.0).
  • Added support for pandas 2.3 when the installed modin version is at least 0.35.0.

Bug Fixes

  • Fixed an issue in hybrid execution mode (PrPr) where pd.to_datetime and pd.to_timedelta would unexpectedly raise IndexError.
  • Fixed a bug where pd.explain_switch would raise IndexError or return None if called before any potential switch operations were performed.

Release

07 Aug 21:14

Choose a tag to compare

1.36.0 (2025-08-05)

Snowpark Python API Updates

New Features

  • Session.create_dataframe now accepts keyword arguments that are forwarded to the internal call to Session.write_pandas or Session.write_arrow when creating a DataFrame from a pandas DataFrame or a pyarrow Table.
  • Added new APIs for AsyncJob:
    • AsyncJob.is_failed() returns a bool indicating if a job has failed. Can be used in combination with AsyncJob.is_done() to determine if a job is finished and errored.
    • AsyncJob.status() returns a string representing the current query status (e.g., "RUNNING", "SUCCESS", "FAILED_WITH_ERROR") for detailed monitoring without calling result().
  • Added a dataframe profiler. To use, you can call get_execution_profile() on your desired dataframe. This profiler reports the queries executed to evaluate a dataframe, and statistics about each of the query operators. Currently an experimental feature
  • Added support for the following functions in functions.py:
    • ai_sentiment
  • Updated the interface for experimental feature context.configure_development_features. All development features are disabled by default unless explicitly enabled by the user.

Snowpark pandas API Updates

New Features

Improvements

  • Hybrid execution row estimate improvements and a reduction of eager calls.
  • Add a new configuration variable to control transfer costs out of Snowflake when using hybrid execution.
  • Added support for creating permanent and immutable UDFs/UDTFs with DataFrame/Series/GroupBy.apply, map, and transform by passing the snowflake_udf_params keyword argument. See documentation for details.

Bug Fixes

  • Fixed an issue where Snowpark pandas plugin would unconditionally disable AutoSwitchBackend even when users had explicitly configured it via environment variables or programmatically.

Release

24 Jul 21:23

Choose a tag to compare

1.35.0 (2025-07-24)

Snowpark Python API Updates

New Features

  • Added support for the following functions in functions.py:
    • ai_embed
    • try_parse_json

Bug Fixes

  • Fixed a bug in DataFrameReader.dbapi (PrPr) that dbapi fail in python stored procedure with process exit with code 1.
  • Fixed a bug in DataFrameReader.dbapi (PrPr) that custom_schema accept illegal schema.
  • Fixed a bug in DataFrameReader.dbapi (PrPr) that custom_schema does not work when connecting to Postgres and Mysql.
  • Fixed a bug in schema inference that would cause it to fail for external stages.

Improvements

  • Improved query parameter in DataFrameReader.dbapi (PrPr) so that parentheses are not needed around the query.
  • Improved error experience in DataFrameReader.dbapi (PrPr) when exception happen during inferring schema of target data source.

Snowpark Local Testing Updates

New Features

  • Added local testing support for reading files with SnowflakeFile using local file paths, the Snow URL semantic (snow://...), local testing framework stages, and Snowflake stages (@stage/file_path).

Snowpark pandas API Updates

New Features

  • Added support for DataFrame.boxplot.

Improvements

  • Reduced the number of UDFs/UDTFs created by repeated calls to apply or map with the same arguments on Snowpark pandas objects.

Bug Fixes

  • Added an upper bound to the row estimation when the cartesian product from an align or join results in a very large number. This mitigates a performance regression.
  • Fix a pd.read_excel bug when reading files inside stage inner directory.

Release

15 Jul 20:15
9c8fc52

Choose a tag to compare

1.34.0 (2025-07-15)

Snowpark Python API Updates

New Features

  • Added a new option TRY_CAST to DataFrameReader. When TRY_CAST is True columns are wrapped in a TRY_CAST statement rather than a hard cast when loading data.
  • Added a new option USE_RELAXED_TYPES to the INFER_SCHEMA_OPTIONS of DataFrameReader. When set to True this option casts all strings to max length strings and all numeric types to DoubleType.
  • Added debuggability improvements to eagerly validate dataframe schema metadata. Enable it using snowflake.snowpark.context.configure_development_features().
  • Added a new function snowflake.snowpark.dataframe.map_in_pandas that allows users map a function across a dataframe. The mapping function takes an iterator of pandas dataframes as input and provides one as output.
  • Added a ttl cache to describe queries. Repeated queries in a 15 second interval will use the cached value rather than requery Snowflake.
  • Added a parameter fetch_with_process to DataFrameReader.dbapi (PrPr) to enable multiprocessing for parallel data fetching in
    local ingestion. By default, local ingestion uses multithreading. Multiprocessing may improve performance for CPU-bound tasks like Parquet file generation.
  • Added a new function snowflake.snowpark.functions.model that allows users to call methods of a model.

Improvements

  • Added support for row validation using XSD schema using rowValidationXSDPath option when reading XML files with a row tag using rowTag option.
  • Improved SQL generation for session.table().sample() to generate a flat SQL statement.
  • Added support for complex column expression as input for functions.explode.
  • Added debuggability improvements to show which Python lines an SQL compilation error corresponds to. Enable it using snowflake.snowpark.context.configure_development_features(). This feature also depends on AST collection to be enabled in the session which can be done using session.ast_enabled = True.
  • Set enforce_ordering=True when calling to_snowpark_pandas() from a snowpark dataframe containing DML/DDL queries instead of throwing a NotImplementedError.

Bug Fixes

  • Fixed a bug caused by redundant validation when creating an iceberg table.
  • Fixed a bug in DataFrameReader.dbapi (PrPr) where closing the cursor or connection could unexpectedly raise an error and terminate the program.
  • Fixed ambiguous column errors when using table functions in DataFrame.select() that have output columns matching the input DataFrame's columns. This improvement works when dataframe columns are provided as Column objects.
  • Fixed a bug where having a NULL in a column with DecimalTypes would cast the column to FloatTypes instead and lead to precision loss.

Snowpark Local Testing Updates

Bug Fixes

  • Fixed a bug when processing windowed functions that lead to incorrect indexing in results.
  • When a scalar numeric is passed to fillna we will ignore non-numeric columns instead of producing an error.

Snowpark pandas API Updates

New Features

  • Added support for DataFrame.to_excel and Series.to_excel.
  • Added support for pd.read_feather, pd.read_orc, and pd.read_stata.
  • Added support for pd.explain_switch() to return debugging information on hybrid execution decisions.
  • Support pd.read_snowflake when the global modin backend is Pandas.
  • Added support for pd.to_dynamic_table, pd.to_iceberg, and pd.to_view.

Improvements

  • Added modin telemetry on API calls and hybrid engine switches.
  • Show more helpful error messages to Snowflake Notebook users when the modin or pandas version does not match our requirements.
  • Added a data type guard to the cost functions for hybrid execution mode (PrPr) which checks for data type compatibility.
  • Added automatic switching to the pandas backend in hybrid execution mode (PrPr) for many methods that are not directly implemented in Snowpark pandas.
  • Set the 'type' and other standard fields for Snowpark pandas telemetry.

Dependency Updates

  • Added tqdm and ipywidgets as dependencies so that progress bars appear when switching between modin backends.
  • Updated the supported modin versions to >=0.33.0 and <0.35.0 (was previously >= 0.32.0 and <0.34.0).

Bug Fixes

  • Fixed a bug in hybrid execution mode (PrPr) where certain Series operations would raise TypeError: numpy.ndarray object is not callable.
  • Fixed a bug in hybrid execution mode (PrPr) where calling numpy operations like np.where on modin objects with the Pandas backend would raise an AttributeError. This fix requires modin version 0.34.0 or newer.
  • Fixed issue in df.melt where the resulting values have an additional suffix applied.