You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+33-35
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,6 @@
1
-
# Databricks MLOps Stack
1
+
# Databricks MLOps Stacks
2
2
3
-
> **_NOTE:_** This feature is in [private preview](https://docs.databricks.com/release-notes/release-types.html). The interface/APIs may change and no formal support is available during the preview. However, you can still create new production-grade ML projects using the stack.
4
-
If interested in trying it out, please fill out this [form](https://docs.google.com/forms/d/e/1FAIpQLSfHXCmkbsEURjQQvtUGObgh2D5q1eD4YRHnUxZ0M4Hu0W63WA/viewform), and you’ll be contacted by a Databricks representative.
3
+
> **_NOTE:_** This feature is in [public preview](https://docs.databricks.com/release-notes/release-types.html).
5
4
6
5
This repo provides a customizable stack for starting new ML projects
7
6
on Databricks that follow production best-practices out of the box.
@@ -19,25 +18,25 @@ Your organization can use the default stack as is or customize it as needed, e.g
19
18
adapt individual components to fit your organization's best practices. See the
20
19
[stack customization guide](stack-customization.md) for more details.
21
20
22
-
Using Databricks MLOps stack, data scientists can quickly get started iterating on ML code for new projects while ops engineers set up CI/CD and ML service state
23
-
management, with an easy transition to production. You can also use MLOps stack as a building block
21
+
Using Databricks MLOps Stacks, data scientists can quickly get started iterating on ML code for new projects while ops engineers set up CI/CD and ML service state
22
+
management, with an easy transition to production. You can also use MLOps Stacks as a building block
24
23
in automation for creating new data science projects with production-grade CI/CD pre-configured.
[Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) v0.204.0 contains [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html) for the purpose of project creation.
39
+
[Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) v0.208.1 contains [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html) for the purpose of project creation.
41
40
42
41
Please follow [the instruction](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html#install-the-cli) to install and set up databricks CLI. Releases of databricks CLI can be found in the [releases section](https://github.com/databricks/cli/releases) of databricks/cli repository.
This will prompt for parameters for project initialization. Some of these parameters are required to get started:
53
52
*``input_project_name``: name of the current project
@@ -78,42 +77,41 @@ See the generated ``README.md`` for next steps!
78
77
79
78
## FAQ
80
79
81
-
### Do I need separate dev/staging/prod workspaces to use this stack?
80
+
### Do I need separate dev/staging/prod workspaces to use MLOps Stacks?
82
81
We recommend using separate dev/staging/prod Databricks workspaces for stronger
83
82
isolation between environments. For example, Databricks REST API rate limits
84
83
are applied per-workspace, so if using [Databricks Model Serving](https://docs.databricks.com/applications/mlflow/model-serving.html),
85
84
using separate workspaces can help prevent high load in staging from DOSing your
86
85
production model serving endpoints.
87
86
88
-
However, you can run the stack against just a single workspace, against a dev and
89
-
staging/prod workspace, etc. Just supply the same workspace URL for
87
+
However, you can create a single workspace stack, by supplying the same workspace URL for
90
88
`input_databricks_staging_workspace_host` and `input_databricks_prod_workspace_host`. If you go this route, we
91
89
recommend using different service principals to manage staging vs prod resources,
92
90
to ensure that CI workloads run in staging cannot interfere with production resources.
93
91
94
-
### I have an existing ML project. Can I productionize it using this stack?
95
-
Yes. Currently, you can instantiate a new project from the stack and copy relevant components
96
-
into your existing project to productionize it. The stack is modularized, so
92
+
### I have an existing ML project. Can I productionize it using MLOps Stacks?
93
+
Yes. Currently, you can instantiate a new project and copy relevant components
94
+
into your existing project to productionize it. MLOps Stacks is modularized, so
97
95
you can e.g. copy just the GitHub Actions workflows under `.github` or ML resource configs
98
96
under ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources``
99
-
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml`` into your existing project.
97
+
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml`` into your existing project.
100
98
101
-
### Can I adopt individual components of the stack?
102
-
For this use case, we recommend instantiating the full stack via [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html)
103
-
and copying the relevant stack subdirectories. For example, all ML resource configs
99
+
### Can I adopt individual components of MLOps Stacks?
100
+
For this use case, we recommend instantiating via [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html)
101
+
and copying the relevant subdirectories. For example, all ML resource configs
104
102
are defined under ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources``
105
-
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml``, while CI/CD is defined e.g. under `.github`
103
+
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml``, while CI/CD is defined e.g. under `.github`
106
104
if using GitHub Actions, or under `.azure` if using Azure DevOps.
107
105
108
-
### Can I customize this stack?
106
+
### Can I customize my MLOps Stack?
109
107
Yes. We provide the default stack in this repo as a production-friendly starting point for MLOps.
110
108
However, in many cases you may need to customize the stack to match your organization's
111
109
best practices. See [the stack customization guide](stack-customization.md)
112
110
for details on how to do this.
113
111
114
-
### Does the MLOps stack cover data (ETL) pipelines?
112
+
### Does the MLOps Stacks cover data (ETL) pipelines?
115
113
116
-
Since MLOps Stack is based on [databricks CLI bundles](https://docs.databricks.com/dev-tools/cli/bundle-commands.html),
114
+
Since MLOps Stacks is based on [databricks CLI bundles](https://docs.databricks.com/dev-tools/cli/bundle-commands.html),
117
115
it's not limited only to ML workflows and assets - it works for assets across the Databricks Lakehouse. For instance, while the existing ML
118
116
code samples contain feature engineering, training, model validation, deployment and batch inference workflows,
119
117
you can use it for Delta Live Tables pipelines as well.
@@ -127,7 +125,7 @@ Please provide feedback (bug reports, feature requests, etc) via GitHub issues.
127
125
We welcome community contributions. For substantial changes, we ask that you first file a GitHub issue to facilitate
128
126
discussion, before opening a pull request.
129
127
130
-
This stack is implemented as a [Databricks asset bundle template](https://docs.databricks.com/en/dev-tools/bundles/templates.html)
128
+
MLOps Stacks is implemented as a [Databricks asset bundle template](https://docs.databricks.com/en/dev-tools/bundles/templates.html)
131
129
that generates new projects given user-supplied parameters. Parametrized project code can be found under
132
130
the `{{.input_root_dir}}` directory.
133
131
@@ -164,25 +162,25 @@ Run integration tests only:
164
162
pytest tests --large-only
165
163
```
166
164
167
-
### Previewing stack changes
168
-
When making changes to the stack, it can be convenient to see how those changes affect
169
-
an actual new ML project created from the stack. To do this, you can create an example
170
-
project from your local checkout of the stack, and inspect its contents/run tests within
165
+
### Previewing changes
166
+
When making changes to MLOps Stacks, it can be convenient to see how those changes affect
167
+
a generated new ML project. To do this, you can create an example
168
+
project from your local checkout of the repo, and inspect its contents/run tests within
171
169
the project.
172
170
173
171
We provide example project configs for Azure (using both GitHub and Azure DevOps) and AWS (using GitHub) under `tests/example-project-configs`.
174
172
To create an example Azure project, using Azure DevOps as the CI/CD platform, run the following from the desired parent directory
175
173
of the example project:
176
174
177
175
```
178
-
# Note: update MLOPS_STACK_PATH to the path to your local checkout of the stack
Copy file name to clipboardExpand all lines: databricks_template_schema.json
+5-4
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@
4
4
"order": 1,
5
5
"type": "string",
6
6
"default": "my-mlops-project",
7
-
"description": "Welcome to MLOps Stack. For detailed information on project generation, see the README at https://github.com/databricks/mlops-stack/blob/main/README.md. \n\nProject Name"
7
+
"description": "Welcome to MLOps Stacks. For detailed information on project generation, see the README at https://github.com/databricks/mlops-stacks/blob/main/README.md. \n\nProject Name"
8
8
},
9
9
"input_root_dir": {
10
10
"order": 2,
@@ -63,8 +63,8 @@
63
63
"input_schema_name": {
64
64
"order": 11,
65
65
"type": "string",
66
-
"description": "\nName of schema to use when registering a model in Unity Catalog. \nNote that this schema must already exist. Default",
67
-
"default": "schema_name"
66
+
"description": "\nName of schema to use when registering a model in Unity Catalog. \nNote that this schema must already exist, and we recommend keeping the name the same as the project name. Default",
67
+
"default": "my-mlops-project"
68
68
},
69
69
"input_unity_catalog_read_user_group": {
70
70
"order": 12,
@@ -84,5 +84,6 @@
84
84
"description": "\nWhether to include MLflow Recipes. \nChoose from no, yes",
85
85
"default": "no"
86
86
}
87
-
}
87
+
},
88
+
"success_message" : "\n✨ Your MLOps Stack has been created in the '{{.input_project_name}}' directory!\n\nPlease refer to the README.md of your project for further instructions on getting started."
We provide the default stack in this repo as a production-friendly starting point for MLOps.
1
+
# MLOps Stacks Customization Guide
2
+
We provide the default MLOps Stack in this repo as a production-friendly starting point for MLOps.
3
3
4
4
For generic enhancements not specific to your organization
5
5
(e.g. add support for a new CI/CD provider), we encourage you to consider contributing the
6
-
change back to the default stack, so that the community can help maintain and enhance it.
6
+
change back to the MLOps Stacks repo, so that the community can help maintain and enhance it.
7
7
8
8
However, in many cases you may need to customize the stack, for example if:
9
9
* You have different Databricks workspace environments (e.g. a "test" workspace for CI, in addition to dev/staging/prod)
@@ -19,20 +19,20 @@ default stack. Before getting started, we encourage you to read
19
19
the [contributor guide](README.md#contributing) to learn how to
20
20
make, preview, and test changes to your custom stack.
21
21
22
-
### Fork the default stack repo
23
-
Fork the default stack repo. You may want to create a private fork if you're tailoring
22
+
### Fork the MLOps Stacks repo
23
+
Fork the MLOps Stacks repo. You may want to create a private fork if you're tailoring
24
24
the stack to the specific needs of your organization, or a public fork if you're creating
25
25
a generic new stack.
26
26
27
-
### (optional) Set up CI for your new stack
28
-
Tests for the default stack are defined under the `tests/` directory and are
27
+
### (optional) Set up CI
28
+
Tests for MLOps Stacks are defined under the `tests/` directory and are
29
29
executed in CI by Github Actions workflows defined under `.github/`. We encourage you to configure
30
-
CI in your own stack repo to ensure the stack continues to work as you make changes.
30
+
CI in your own MLOps Stacks repo to ensure it continues to work as you make changes.
31
31
If you use GitHub Actions for CI, the provided workflows should work out of the box.
32
32
Otherwise, you'll need to translate the workflows under `.github/` to the CI provider of your
33
33
choice.
34
34
35
-
### Update stack parameters
35
+
### Update MLOps Stacks parameters
36
36
Update parameters in your fork as needed in `databricks_template_schema.json` and update corresponding template variable in `library/template_variables.tmpl`. Pruning the set of
37
37
parameters makes it easier for data scientists to start new projects, at the cost of reduced flexibility.
38
38
@@ -41,16 +41,15 @@ For example, you may have a fixed set of staging & prod Databricks workspaces (o
41
41
also run all of your ML pipelines on a single cloud, in which case the `input_cloud` parameter is unnecessary.
42
42
43
43
The easiest way to prune parameters and replace them with hardcoded values is to follow
44
-
the [contributor guide](README.md#previewing-stack-changes) to generate an example project with
45
-
parameters substituted-in, and then copy the generated project contents back into your stack.
44
+
the [contributor guide](README.md#previewing-changes) to generate an example project with
45
+
parameters substituted-in, and then copy the generated project contents back into your MLOps Stacks repo.
46
46
47
47
## Customize individual components
48
48
49
49
### Example ML code
50
-
The default stack provides example ML code using [MLflow recipes](https://mlflow.org/docs/latest/recipes.html#).
50
+
MLOps Stacks provides example ML code.
51
51
You may want to customize the example code, e.g. further prune it down into a skeleton for data scientists
52
-
to fill out, or remove and replace the use of MLflow Recipes if you expect data scientists to work on problem
53
-
types that are currently unsupported by MLflow Recipes.
52
+
to fill out.
54
53
55
54
If you customize this component, you can still use the CI/CD and ML resource components to build production ML pipelines, as long as you provide ML
56
55
notebooks with the expected interface. For example, model training under ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/training/notebooks/`` and inference under
@@ -60,14 +59,13 @@ You may also want to update developer-facing docs under `template/{{.input_root_
60
59
or `template/{{.input_root_dir}}/docs/ml-developer-guide-fs.md`, which will be read by users of your stack.
61
60
62
61
### CI/CD workflows
63
-
The default stack currently has the following sub-components for CI/CD:
62
+
MLOps Stacks currently has the following sub-components for CI/CD:
64
63
* CI/CD workflow logic defined under `template/{{.input_root_dir}}/.github/` for testing and deploying ML code and models
65
-
* Automated scripts and docs for setting up CI/CD under `template/{{.input_root_dir}}/.mlops-setup-scripts/`
66
64
* Logic to trigger model deployment through REST API calls to your CD system, when model training completes.
67
-
This logic is currently captured in ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/deployment/model_deployment/notebooks/TriggerModelDeploy.py``
65
+
This logic is currently captured in ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/deployment/model_deployment/notebooks/ModelDeployment.py``
68
66
69
67
### ML resource configs
70
-
Root ML resource config file can be found as ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml``.
68
+
Root ML resource config file can be found as ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml``.
71
69
It defines the ML config resources to be included and workspace host for each deployment target.
72
70
73
71
ML resource configs (databricks CLI bundles code definitions of ML jobs, experiments, models etc) can be found under
@@ -80,7 +78,7 @@ When updating this component, you may want to update developer-facing docs in
0 commit comments