- Downloading fraud model training datasets from Google drive.
- Original datasets are from https://www.kaggle.com/competitions/ieee-fraud-detection/data
- DLT with autoloader
- Show high-level solution architecture
- Feature engineering
- Registering features into multiple feature tables
- Model training and experiment auto-tracking with mlflow
- Log model to register feature lookup logic
- Adding new features to existing feature table
- Merge/upsert feature tables
- Publishing features to online store i.e. DynamoDB
- Illustrate how CI/CD tooling (e.g. Jenkins) is integrated with model registry
- Setup webhooks for trigger CI build, i.e. model signature and bias testing
- Illustrate how serverless model endpoint is integrated with offline/online feature store for automatic feature lookup
- Deploy model serving endpoint
- Score the deployed model via REST API request
Feature store docs
Online feature stores
CI Build / model testing and evaluation
- https://docs.databricks.com/mlflow/model-registry-webhooks.html
- http://knowledge-repo-1712701941.us-east-2.elb.amazonaws.com/post/ML/05_ops_validation.kp
Data and model monitoring
- https://www.databricks.com/p/webinar/2021-10-20-hands-on-workshop-unified-ml-monitoring-on-databricks
- https://www.databricks.com/session_na21/drifting_away-testing-ml-models-in-production
- https://www.databricks.com/blog/2019/09/18/productionizing-machine-learning-from-deployment-to-drift-detection.html