You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 15, 2022. It is now read-only.
* Dbaas 3689 (#52)
* DBAAS-3689: using potential IndexToString to try to get class labels for spark pipeline
* more specific check on model
* more specific check on model
* edge case
* escaping labels
* more escaping
* wrong esacping :/
* incorrect assumption of spark model
* code cleanup and pass logic to scala (#53)
* function cleanup and pass logic to scala
* python to scala list
* Dbaas 3804 (#55)
* initial sklearn deploy code
* more config for sklearn
* support pandas df
* typing
* syntax
* returning file_ext but shouldn't
* fixing model insert for sklearn
* signature object needs parameters
* predict_args not predict_params
* missing logic for prediction table
* sql formatting
* edge cases
* elif to if
* base case
* more work around pipelines
* more validation of skearln_args
* set comparison
* sklearn_args cleanup
* fix for pipeline model type function
* need file ext
Function to deploy a trained (Spark for now) model to the Database. This creates 2 tables: One with the features of the model, and one with the prediction and metadata.
514
+
def_deploy_db(fittedModel,
515
+
df,
516
+
db_schema_name,
517
+
db_table_name,
518
+
primary_key,
519
+
run_id: str=None,
520
+
classes=None,
521
+
sklearn_args={},
522
+
verbose=False,
523
+
replace=False) ->None:
524
+
"""
525
+
Function to deploy a trained (currently Spark, Sklearn or H2O) model to the Database.
526
+
This creates 2 tables: One with the features of the model, and one with the prediction and metadata.
515
527
They are linked with a column called MOMENT_ID
516
528
517
-
:param fittedPipe: (spark pipeline or model) The fitted pipeline to deploy
529
+
:param fittedModel: (ML pipeline or model) The fitted pipeline to deploy
518
530
:param df: (Spark DF) The dataframe used to train the model
519
531
NOTE: this dataframe should NOT be transformed by the model. The columns in this df are the ones
520
532
that will be used to create the table.
521
533
:param db_schema_name: (str) the schema name to deploy to. If None, the currently set schema will be used.
522
534
:param db_table_name: (str) the table name to deploy to. If none, the run_id will be used for the table name(s)
523
535
:param primary_key: (List[Tuple[str, str]]) List of column + SQL datatype to use for the primary/composite key
524
536
:param run_id: (str) The active run_id
525
-
:param classes: List[str] The classes (prediction values) for the model being deployed.
526
-
NOTE: If not supplied, the table will have column named c0,c1,c2 etc for each class
527
-
:param verbose: bool Whether or not to print out the queries being created. Helpful for debugging
537
+
:param classes: (List[str]) The classes (prediction labels) for the model being deployed.
538
+
NOTE: If not supplied, the table will have default column names for each class
539
+
:param sklearn_args: (dict{str: str}) Prediction options for sklearn models
540
+
Available key value options:
541
+
'predict_call': 'predict', 'predict_proba', or 'transform'
542
+
- Determines the function call for the model
543
+
If blank, predict will be used
544
+
(or transform if model doesn't have predict)
545
+
'predict_args': 'return_std' or 'return_cov' - For Bayesian and Gaussian models
546
+
Only one can be specified
547
+
If the model does not have the option specified, it will be ignored.
548
+
:param verbose: (bool) Whether or not to print out the queries being created. Helpful for debugging
549
+
:param replace: (bool) whether or not to replace a currently existing model. This param does not yet work
528
550
529
551
This function creates the following:
530
552
* Table (default called DATA_{run_id}) where run_id is the run_id of the mlflow run associated to that model. This will have a column for each feature in the feature vector as well as a MOMENT_ID as primary key
531
553
* Table (default called DATA_{run_id}_PREDS) That will have the columns:
532
554
USER which is the current user who made the request
533
555
EVAL_TIME which is the CURRENT_TIMESTAMP
534
556
MOMENT_ID same as the DATA table to link predictions to rows in the table
535
-
PREDICTION. The prediction of the model. If the :classes: param is not filled in, this will be c0,c1,c2 etc for classification models
557
+
PREDICTION. The prediction of the model. If the :classes: param is not filled in, this will be default values for classification models
536
558
A column for each class of the predictor with the value being the probability/confidence of the model if applicable
537
559
* A trigger that runs on (after) insertion to the data table that runs an INSERT into the prediction table,
538
560
calling the PREDICT function, passing in the row of data as well as the schema of the dataset, and the run_id of the model to run
0 commit comments