Releases: snowflakedb/snowflake-ml-python
Releases · snowflakedb/snowflake-ml-python
1.5.1
1.5.1
Bug Fixes
- Dataset: Fix
snowflake.connector.errors.DataError: Query Result did not match expected number of rows
when accessing
DatasetVersion properties when case insensitiveSHOW VERSIONS IN DATASET
check matches multiple version names. - Dataset: Fix bug in SnowFS bulk file read when used with DuckDB
- Registry: Fixed a bug when loading old models.
- Lineage: Fix Dataset source lineage propagation through
snowpark.DataFrame
transformations
Behavior Changes
- Feature Store: convert clear() into a private function. Also make it deletes feature views and entities only.
- Feature Store: Use NULL as default value for timestamp tag value.
New Features
- Feature Store: Added new
snowflake.ml.feature_store.setup_feature_store()
API to assist Feature Store RBAC setup. - Feature Store: Add
output_type
argument toFeatureStore.generate_dataset()
to allow generating data snapshots
as Datasets or Tables. - Registry:
log_model
,get_model
,delete_model
now supports fully qualified name. - Modeling: Supports anonymous stored procedure during fit calls so that modeling would not require sufficient
permissions to operate on schema. Please call
import snowflake.ml.modeling.parameters.enable_anonymous_sproc # noqa: F401
1.5.0
1.5.0
Bug Fixes
- Registry: Fix invalid parameter 'SHOW_MODEL_DETAILS_IN_SHOW_VERSIONS_IN_MODEL' error.
Behavior Changes
- Model Development: The behavior of
fit_transform
for all estimators is changed.
Firstly, it will cover all the estimator that contains this function,
secondly, the output would be the union of pandas DataFrame and snowpark DataFrame.
Model Registry (PrPr)
snowflake.ml.registry.artifact
and related snowflake.ml.model_registry.ModelRegistry
APIs have been removed.
- Removed
snowflake.ml.registry.artifact
module. - Removed
ModelRegistry.log_artifact()
,ModelRegistry.list_artifacts()
,ModelRegistry.get_artifact()
- Removed
artifacts
argument fromModelRegistry.log_model()
Dataset (PrPr)
snowflake.ml.dataset.Dataset
has been redesigned to be backed by Snowflake Dataset entities.
- New
Dataset
s can be created withDataset.create()
and existingDataset
s may be loaded
withDataset.load()
. Dataset
s now maintain an immutableselected_version
state. TheDataset.create_version()
and
Dataset.load_version()
APIs return newDataset
objects with the requestedselected_version
state.- Added
dataset.create_from_dataframe()
anddataset.load_dataset()
convenience APIs as a shortcut
to creating and loadingDataset
s with a pre-selected version. Dataset.materialized_table
andDataset.snapshot_table
no longer exist withDataset.fully_qualified_name
as the closest equivalent.Dataset.df
no longer exists. Instead, useDatasetReader.read.to_snowpark_dataframe()
.Dataset.owner
has been moved toDataset.selected_version.owner
Dataset.desc
has been moved toDatasetVersion.selected_version.comment
Dataset.timestamp_col
,Dataset.label_cols
,Dataset.feature_store_metadata
, and
Dataset.schema_version
have been removed.
Feature Store (PrPr)
FeatureStore.generate_dataset
argument list has been changed to match the new
snowflake.ml.dataset.Dataset
definition
materialized_table
has been removed and replaced withname
andversion
.name
moved to first positional argumentsave_mode
has been removed asmerge
behavior is no longer supported. The new behavior is alwayserrorifexists
.
New Features
- Registry: Add
export
method toModelVersion
instance to export model files. - Registry: Add
load
method toModelVersion
instance to load the underlying object from the model. - Registry: Add
Model.rename
method toModel
instance to rename or move a model.
Dataset (PrPr)
- Added Snowpark DataFrame integration using
Dataset.read.to_snowpark_dataframe()
- Added Pandas DataFrame integration using
Dataset.read.to_pandas()
- Added PyTorch and TensorFlow integrations using
Dataset.read.to_torch_datapipe()
andDataset.read.to_tf_dataset()
respectively. - Added
fsspec
style file integration usingDataset.read.files()
andDataset.read.filesystem()
1.4.1 (2024-04-18)
New Features
- Registry: Add support for
catboost
model (catboost.CatBoostClassifier
,catboost.CatBoostRegressor
). - Registry: Add support for
lightgbm
model (lightgbm.Booster
,lightgbm.LightGBMClassifier
,lightgbm.LightGBMRegressor
).
Bug Fixes
- Registry: Fix a bug that leads to relax_version option is not working.
1.4.0
1.4.0
Bug Fixes
- Registry: Fix a bug when multiple models are being called from the same query, models other than the first one will
have incorrect result. This fix only works for newly logged model. - Modeling: When registering a model, only method(s) that is mentioned in
save_model
would be added to model signature
in SnowML models. - Modeling: Fix a bug that when n_jobs is not 1, model cannot execute methods such as
predict, predict_log_proba, and other batch inference methods. The n_jobs would automatically
set to 1 because vectorized udf currently doesn't support joblib parallel backend. - Modeling: Fix a bug that batch inference methods cannot infer the datatype when the first row of data contains NULL.
- Modeling: Matches Distributed HPO output column names with the snowflake identifier.
- Modeling: Relax package versions for all Distributed HPO methods if the installed version
is not available in the Snowflake conda channel - Modeling: Add sklearn as required dependency for LightGBM package.
Behavior Changes
- Registry:
apply
method is no longer by default logged when logging a xgboost model. If that is required, it could
be specified manually when logging the model bylog_model(..., options={"target_methods": ["apply", ...]})
.
New Features
- Registry: Add support for
sentence-transformers
model (sentence_transformers.SentenceTransformer
). - Registry: Now version name is no longer required when logging a model. If not provided, a random human readable ID
will be generated.
1.3.1
1.3.1
New Features
- FileSet:
snowflake.ml.fileset.sfcfs.SFFileSystem
can now be used in UDFs and stored procedures.
1.3.0
1.3.0
Bug Fixes
- Registry: Fix a bug that leads to module in
code_paths
whenlog_model
cannot be correctly imported. - Registry: Fix incorrect error message when validating input Snowpark DataFrame with array feature.
- Model Registry: Fix an issue when deploying a model to SPCS that some files do not have proper permission.
- Model Development: Relax package versions for all inference methods if the installed version
is not available in the Snowflake conda channel
Behavior Changes
- Registry: When running the method of a model, the value range based input validation to avoid input from overflowing
is now optional rather than enforced, this should improve the performance and should not lead to problem for most
kinds of model. If you want to enable this check as previous, specifystrict_input_validation=True
when
callingrun
. - Registry: By default
relax_version=True
when logging a model instead of using the specific local dependency versions.
This improves dependency versioning by using versions available in Snowflake. To switch back to the previous behavior
and use specific local dependency versions, specifyrelax_version=False
when callinglog_model
. - Model Development: The behavior of
fit_predict
for all estimators is changed.
Firstly, it will cover all the estimator that contains this function,
secondly, the output would be the union of pandas DataFrame and snowpark DataFrame.
New Features
- FileSet:
snowflake.ml.fileset.sfcfs.SFFileSystem
can now be serialized withpickle
.
1.2.3
1.2.3
Bug Fixes
- Registry: Now when providing Decimal Type column to a DOUBLE or FLOAT feature will not error out but auto cast with
warnings. - Registry: Improve the error message when specifying currently unsupported
pip_requirements
argument. - Model Development: Fix precision_recall_fscore_support incorrect results when
average="samples"
. - Model Registry: Fix an issue that leads to description, metrics or tags are not correctly returned in newly created
Model Registry (PrPr) due to Snowflake BCR 2024_01
Behavior Changes
- Feature Store:
FeatureStore.suspend_feature_view
andFeatureStore.resume_feature_view
doesn't mutate input feature
view argument any more. The updated status only reflected in the returned feature view object.
New Features
- Model Development: support
score_samples
method for all the classes, including Pipeline,
GridSearchCV, RandomizedSearchCV, PCA, IsolationForest, ... - Registry: Support deleting a version of a model.
1.2.2
1.2.2
Bug Fixes
Behavior Changes
New Features
- Model Registry: Support providing external access integrations when deploying a model to SPCS. This will help and be
required to make sure the deploying process work as long as SPCS will by default deny all network connections. The
following endpoints must be allowed to make deployment work: docker.com:80, docker.com:443, anaconda.com:80,
anaconda.com:443, anaconda.org:80, anaconda.org:443, pypi.org:80, pypi.org:443. If you are using
snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel
object, the following endpoints are required
to be allowed: huggingface.com:80, huggingface.com:443, huggingface.co:80, huggingface.co:443.
1.2.1
1.2.1
New Features
- Model Development: Infers output column data type for transformers when possible.
- Registry:
relax_version
option is available in theoptions
argument when logging the model.
1.2.0
1.2.0
Bug Fixes
- Model Registry: Fix "XGBoost version not compiled with GPU support" error when running CPU inference against open-source
XGBoost models deployed to SPCS. - Model Registry: Fix model deployment to SPCS on Windows machines.
Behavior Changes
New Features
- Model Development: Introduced XGBoost external memory training feature. This feature enables training XGBoost models
on large datasets that don't fit into memory. - Registry: New Registry class named
snowflake.ml.registry.Registry
providing similar APIs as the old one but works
with new MODEL object in Snowflake SQL. Also, we are providingsnowflake.ml.model.Model
and
snowflake.ml.model.ModelVersion
to represent a model and a specific version of a model. - Model Development: Add support for
fit_predict
method inAgglomerativeClustering
,DBSCAN
, andOPTICS
classes; - Model Development: Add support for
fit_transform
method inMDS
,SpectralEmbedding
andTSNE
class.
Additional Notes
- Model Registry: The
snowflake.ml.registry.model_registry.ModelRegistry
has been deprecated starting from version
1.2.0. It will stay in the Private Preview phase. For future implementations, kindly utilize
snowflake.ml.registry.Registry
, except when specifically required. The old model registry will be removed once all
its primary functionalities are fully integrated into the new registry.
1.1.2
1.1.2
Bug Fixes
- Generic: Fix the issue that stack trace is hidden by telemetry unexpectedly.
- Model Development: Execute model signature inference without materializing full dataframe in memory.
- Model Registry: Fix occasional 'snowflake-ml-python library does not exist' error when deploying to SPCS.
Behavior Changes
- Model Registry: When calling
predict
with Snowpark DataFrame, both inferred or normalized column names are accepted. - Model Registry: When logging a Snowpark ML Modeling Model, sample input data or manually provided signature will be
ignored since they are not necessary.
New Features
- Model Development: SQL implementation of binary
precision_score
metric.