Skip to content

Commit 5d940d5

Browse files
authored
Rename Stack to Stacks and Fix Bugs (#107)
* Rename Stack to Stacks and Fix Bugs * Update version to 0.208.1 * Changed private preview notice to public preview * Add success_message * Fix bug * Fix doc * Fix deployment bug * Fix doc bug
1 parent ea885c8 commit 5d940d5

30 files changed

+134
-147
lines changed

Diff for: .gitignore

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,4 @@
1111
__pycache__/
1212
.cache
1313
*.pyc
14-
mlops-stack.iml
14+
mlops-stacks.iml

Diff for: Pipeline.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# ML Pipeline Structure and Devloop
2-
The default stack contains an ML pipeline with CI/CD workflows to test and deploy
2+
MLOps Stacks contains an ML pipeline with CI/CD workflows to test and deploy
33
automated model training and batch inference jobs across your dev, staging, and prod Databricks
44
workspaces.
55

Diff for: README.md

+33-35
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
1-
# Databricks MLOps Stack
1+
# Databricks MLOps Stacks
22

3-
> **_NOTE:_** This feature is in [private preview](https://docs.databricks.com/release-notes/release-types.html). The interface/APIs may change and no formal support is available during the preview. However, you can still create new production-grade ML projects using the stack.
4-
If interested in trying it out, please fill out this [form](https://docs.google.com/forms/d/e/1FAIpQLSfHXCmkbsEURjQQvtUGObgh2D5q1eD4YRHnUxZ0M4Hu0W63WA/viewform), and you’ll be contacted by a Databricks representative.
3+
> **_NOTE:_** This feature is in [public preview](https://docs.databricks.com/release-notes/release-types.html).
54
65
This repo provides a customizable stack for starting new ML projects
76
on Databricks that follow production best-practices out of the box.
@@ -19,25 +18,25 @@ Your organization can use the default stack as is or customize it as needed, e.g
1918
adapt individual components to fit your organization's best practices. See the
2019
[stack customization guide](stack-customization.md) for more details.
2120

22-
Using Databricks MLOps stack, data scientists can quickly get started iterating on ML code for new projects while ops engineers set up CI/CD and ML service state
23-
management, with an easy transition to production. You can also use MLOps stack as a building block
21+
Using Databricks MLOps Stacks, data scientists can quickly get started iterating on ML code for new projects while ops engineers set up CI/CD and ML service state
22+
management, with an easy transition to production. You can also use MLOps Stacks as a building block
2423
in automation for creating new data science projects with production-grade CI/CD pre-configured.
2524

26-
![MLOps Stack diagram](doc-images/mlops-stack.png)
25+
![MLOps Stacks diagram](doc-images/mlops-stacks.png)
2726

2827
See the [FAQ](#FAQ) for questions on common use cases.
2928

3029
## ML pipeline structure and devloop
3130
[See this page](Pipeline.md) for detailed description and diagrams of the ML pipeline
3231
structure defined in the default stack.
3332

34-
## Using this stack
33+
## Using MLOps Stacks
3534

3635
### Prerequisites
3736
- Python 3.8+
38-
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) >= v0.204.0
37+
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) >= v0.208.1
3938

40-
[Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) v0.204.0 contains [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html) for the purpose of project creation.
39+
[Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/databricks-cli.html) v0.208.1 contains [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html) for the purpose of project creation.
4140

4241
Please follow [the instruction](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html#install-the-cli) to install and set up databricks CLI. Releases of databricks CLI can be found in the [releases section](https://github.com/databricks/cli/releases) of databricks/cli repository.
4342

@@ -47,7 +46,7 @@ Please follow [the instruction](https://docs.databricks.com/en/dev-tools/cli/dat
4746

4847
To create a new project, run:
4948

50-
databricks bundle init https://github.com/databricks/mlops-stack
49+
databricks bundle init mlops-stacks
5150

5251
This will prompt for parameters for project initialization. Some of these parameters are required to get started:
5352
* ``input_project_name``: name of the current project
@@ -78,42 +77,41 @@ See the generated ``README.md`` for next steps!
7877

7978
## FAQ
8079

81-
### Do I need separate dev/staging/prod workspaces to use this stack?
80+
### Do I need separate dev/staging/prod workspaces to use MLOps Stacks?
8281
We recommend using separate dev/staging/prod Databricks workspaces for stronger
8382
isolation between environments. For example, Databricks REST API rate limits
8483
are applied per-workspace, so if using [Databricks Model Serving](https://docs.databricks.com/applications/mlflow/model-serving.html),
8584
using separate workspaces can help prevent high load in staging from DOSing your
8685
production model serving endpoints.
8786

88-
However, you can run the stack against just a single workspace, against a dev and
89-
staging/prod workspace, etc. Just supply the same workspace URL for
87+
However, you can create a single workspace stack, by supplying the same workspace URL for
9088
`input_databricks_staging_workspace_host` and `input_databricks_prod_workspace_host`. If you go this route, we
9189
recommend using different service principals to manage staging vs prod resources,
9290
to ensure that CI workloads run in staging cannot interfere with production resources.
9391

94-
### I have an existing ML project. Can I productionize it using this stack?
95-
Yes. Currently, you can instantiate a new project from the stack and copy relevant components
96-
into your existing project to productionize it. The stack is modularized, so
92+
### I have an existing ML project. Can I productionize it using MLOps Stacks?
93+
Yes. Currently, you can instantiate a new project and copy relevant components
94+
into your existing project to productionize it. MLOps Stacks is modularized, so
9795
you can e.g. copy just the GitHub Actions workflows under `.github` or ML resource configs
9896
under ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources``
99-
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml`` into your existing project.
97+
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml`` into your existing project.
10098

101-
### Can I adopt individual components of the stack?
102-
For this use case, we recommend instantiating the full stack via [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html)
103-
and copying the relevant stack subdirectories. For example, all ML resource configs
99+
### Can I adopt individual components of MLOps Stacks?
100+
For this use case, we recommend instantiating via [Databricks asset bundle templates](https://docs.databricks.com/en/dev-tools/bundles/templates.html)
101+
and copying the relevant subdirectories. For example, all ML resource configs
104102
are defined under ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources``
105-
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml``, while CI/CD is defined e.g. under `.github`
103+
and ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml``, while CI/CD is defined e.g. under `.github`
106104
if using GitHub Actions, or under `.azure` if using Azure DevOps.
107105

108-
### Can I customize this stack?
106+
### Can I customize my MLOps Stack?
109107
Yes. We provide the default stack in this repo as a production-friendly starting point for MLOps.
110108
However, in many cases you may need to customize the stack to match your organization's
111109
best practices. See [the stack customization guide](stack-customization.md)
112110
for details on how to do this.
113111

114-
### Does the MLOps stack cover data (ETL) pipelines?
112+
### Does the MLOps Stacks cover data (ETL) pipelines?
115113

116-
Since MLOps Stack is based on [databricks CLI bundles](https://docs.databricks.com/dev-tools/cli/bundle-commands.html),
114+
Since MLOps Stacks is based on [databricks CLI bundles](https://docs.databricks.com/dev-tools/cli/bundle-commands.html),
117115
it's not limited only to ML workflows and assets - it works for assets across the Databricks Lakehouse. For instance, while the existing ML
118116
code samples contain feature engineering, training, model validation, deployment and batch inference workflows,
119117
you can use it for Delta Live Tables pipelines as well.
@@ -127,7 +125,7 @@ Please provide feedback (bug reports, feature requests, etc) via GitHub issues.
127125
We welcome community contributions. For substantial changes, we ask that you first file a GitHub issue to facilitate
128126
discussion, before opening a pull request.
129127

130-
This stack is implemented as a [Databricks asset bundle template](https://docs.databricks.com/en/dev-tools/bundles/templates.html)
128+
MLOps Stacks is implemented as a [Databricks asset bundle template](https://docs.databricks.com/en/dev-tools/bundles/templates.html)
131129
that generates new projects given user-supplied parameters. Parametrized project code can be found under
132130
the `{{.input_root_dir}}` directory.
133131

@@ -164,25 +162,25 @@ Run integration tests only:
164162
pytest tests --large-only
165163
```
166164

167-
### Previewing stack changes
168-
When making changes to the stack, it can be convenient to see how those changes affect
169-
an actual new ML project created from the stack. To do this, you can create an example
170-
project from your local checkout of the stack, and inspect its contents/run tests within
165+
### Previewing changes
166+
When making changes to MLOps Stacks, it can be convenient to see how those changes affect
167+
a generated new ML project. To do this, you can create an example
168+
project from your local checkout of the repo, and inspect its contents/run tests within
171169
the project.
172170

173171
We provide example project configs for Azure (using both GitHub and Azure DevOps) and AWS (using GitHub) under `tests/example-project-configs`.
174172
To create an example Azure project, using Azure DevOps as the CI/CD platform, run the following from the desired parent directory
175173
of the example project:
176174

177175
```
178-
# Note: update MLOPS_STACK_PATH to the path to your local checkout of the stack
179-
MLOPS_STACK_PATH=~/mlops-stack
180-
databricks bundle init "$MLOPS_STACK_PATH" --config-file "$MLOPS_STACK_PATH/tests/example-project-configs/azure/azure-devops.json"
176+
# Note: update MLOPS_STACKS_PATH to the path to your local checkout of the MLOps Stacks repo
177+
MLOPS_STACKS_PATH=~/mlops-stacks
178+
databricks bundle init "$MLOPS_STACKS_PATH" --config-file "$MLOPS_STACKS_PATH/tests/example-project-configs/azure/azure-devops.json"
181179
```
182180

183181
To create an example AWS project, using GitHub Actions for CI/CD, run:
184182
```
185-
# Note: update MLOPS_STACK_PATH to the path to your local checkout of the stack
186-
MLOPS_STACK_PATH=~/mlops-stack
187-
databricks bundle init "$MLOPS_STACK_PATH" --config-file "$MLOPS_STACK_PATH/tests/example-project-configs/aws/aws-github.json"
183+
# Note: update MLOPS_STACKS_PATH to the path to your local checkout of the MLOps Stacks repo
184+
MLOPS_STACKS_PATH=~/mlops-stacks
185+
databricks bundle init "$MLOPS_STACKS_PATH" --config-file "$MLOPS_STACKS_PATH/tests/example-project-configs/aws/aws-github.json"
188186
```

Diff for: databricks_template_schema.json

+5-4
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"order": 1,
55
"type": "string",
66
"default": "my-mlops-project",
7-
"description": "Welcome to MLOps Stack. For detailed information on project generation, see the README at https://github.com/databricks/mlops-stack/blob/main/README.md. \n\nProject Name"
7+
"description": "Welcome to MLOps Stacks. For detailed information on project generation, see the README at https://github.com/databricks/mlops-stacks/blob/main/README.md. \n\nProject Name"
88
},
99
"input_root_dir": {
1010
"order": 2,
@@ -63,8 +63,8 @@
6363
"input_schema_name": {
6464
"order": 11,
6565
"type": "string",
66-
"description": "\nName of schema to use when registering a model in Unity Catalog. \nNote that this schema must already exist. Default",
67-
"default": "schema_name"
66+
"description": "\nName of schema to use when registering a model in Unity Catalog. \nNote that this schema must already exist, and we recommend keeping the name the same as the project name. Default",
67+
"default": "my-mlops-project"
6868
},
6969
"input_unity_catalog_read_user_group": {
7070
"order": 12,
@@ -84,5 +84,6 @@
8484
"description": "\nWhether to include MLflow Recipes. \nChoose from no, yes",
8585
"default": "no"
8686
}
87-
}
87+
},
88+
"success_message" : "\n✨ Your MLOps Stack has been created in the '{{.input_project_name}}' directory!\n\nPlease refer to the README.md of your project for further instructions on getting started."
8889
}

Diff for: doc-images/mlops-stack.png

-117 KB
Binary file not shown.

Diff for: doc-images/mlops-stacks.png

1.04 MB
Loading

Diff for: stack-customization.md

+20-22
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
# Stack Customization Guide
2-
We provide the default stack in this repo as a production-friendly starting point for MLOps.
1+
# MLOps Stacks Customization Guide
2+
We provide the default MLOps Stack in this repo as a production-friendly starting point for MLOps.
33

44
For generic enhancements not specific to your organization
55
(e.g. add support for a new CI/CD provider), we encourage you to consider contributing the
6-
change back to the default stack, so that the community can help maintain and enhance it.
6+
change back to the MLOps Stacks repo, so that the community can help maintain and enhance it.
77

88
However, in many cases you may need to customize the stack, for example if:
99
* You have different Databricks workspace environments (e.g. a "test" workspace for CI, in addition to dev/staging/prod)
@@ -19,20 +19,20 @@ default stack. Before getting started, we encourage you to read
1919
the [contributor guide](README.md#contributing) to learn how to
2020
make, preview, and test changes to your custom stack.
2121

22-
### Fork the default stack repo
23-
Fork the default stack repo. You may want to create a private fork if you're tailoring
22+
### Fork the MLOps Stacks repo
23+
Fork the MLOps Stacks repo. You may want to create a private fork if you're tailoring
2424
the stack to the specific needs of your organization, or a public fork if you're creating
2525
a generic new stack.
2626

27-
### (optional) Set up CI for your new stack
28-
Tests for the default stack are defined under the `tests/` directory and are
27+
### (optional) Set up CI
28+
Tests for MLOps Stacks are defined under the `tests/` directory and are
2929
executed in CI by Github Actions workflows defined under `.github/`. We encourage you to configure
30-
CI in your own stack repo to ensure the stack continues to work as you make changes.
30+
CI in your own MLOps Stacks repo to ensure it continues to work as you make changes.
3131
If you use GitHub Actions for CI, the provided workflows should work out of the box.
3232
Otherwise, you'll need to translate the workflows under `.github/` to the CI provider of your
3333
choice.
3434

35-
### Update stack parameters
35+
### Update MLOps Stacks parameters
3636
Update parameters in your fork as needed in `databricks_template_schema.json` and update corresponding template variable in `library/template_variables.tmpl`. Pruning the set of
3737
parameters makes it easier for data scientists to start new projects, at the cost of reduced flexibility.
3838

@@ -41,16 +41,15 @@ For example, you may have a fixed set of staging & prod Databricks workspaces (o
4141
also run all of your ML pipelines on a single cloud, in which case the `input_cloud` parameter is unnecessary.
4242

4343
The easiest way to prune parameters and replace them with hardcoded values is to follow
44-
the [contributor guide](README.md#previewing-stack-changes) to generate an example project with
45-
parameters substituted-in, and then copy the generated project contents back into your stack.
44+
the [contributor guide](README.md#previewing-changes) to generate an example project with
45+
parameters substituted-in, and then copy the generated project contents back into your MLOps Stacks repo.
4646

4747
## Customize individual components
4848

4949
### Example ML code
50-
The default stack provides example ML code using [MLflow recipes](https://mlflow.org/docs/latest/recipes.html#).
50+
MLOps Stacks provides example ML code.
5151
You may want to customize the example code, e.g. further prune it down into a skeleton for data scientists
52-
to fill out, or remove and replace the use of MLflow Recipes if you expect data scientists to work on problem
53-
types that are currently unsupported by MLflow Recipes.
52+
to fill out.
5453

5554
If you customize this component, you can still use the CI/CD and ML resource components to build production ML pipelines, as long as you provide ML
5655
notebooks with the expected interface. For example, model training under ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/training/notebooks/`` and inference under
@@ -60,14 +59,13 @@ You may also want to update developer-facing docs under `template/{{.input_root_
6059
or `template/{{.input_root_dir}}/docs/ml-developer-guide-fs.md`, which will be read by users of your stack.
6160

6261
### CI/CD workflows
63-
The default stack currently has the following sub-components for CI/CD:
62+
MLOps Stacks currently has the following sub-components for CI/CD:
6463
* CI/CD workflow logic defined under `template/{{.input_root_dir}}/.github/` for testing and deploying ML code and models
65-
* Automated scripts and docs for setting up CI/CD under `template/{{.input_root_dir}}/.mlops-setup-scripts/`
6664
* Logic to trigger model deployment through REST API calls to your CD system, when model training completes.
67-
This logic is currently captured in ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/deployment/model_deployment/notebooks/TriggerModelDeploy.py``
65+
This logic is currently captured in ``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/deployment/model_deployment/notebooks/ModelDeployment.py``
6866

6967
### ML resource configs
70-
Root ML resource config file can be found as ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/bundle.yml``.
68+
Root ML resource config file can be found as ``{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/databricks.yml``.
7169
It defines the ML config resources to be included and workspace host for each deployment target.
7270

7371
ML resource configs (databricks CLI bundles code definitions of ML jobs, experiments, models etc) can be found under
@@ -80,7 +78,7 @@ When updating this component, you may want to update developer-facing docs in
8078
``template/{{.input_root_dir}}/{{template `project_name_alphanumeric_underscore` .}}/resources/README.md``.
8179

8280
### Docs
83-
After making stack customizations, make any changes needed to
84-
the stack docs under `template/{{.input_root_dir}}/docs` and in the main README
85-
(`template/{{.input_root_dir}}/README.md`) to reflect any updates you've made to the stack.
86-
For example, you may want to include a link to your custom stack in `template/{{.input_root_dir}}/README.md`.
81+
After making customizations, make any changes needed to
82+
the docs under `template/{{.input_root_dir}}/docs` and in the main README
83+
(`template/{{.input_root_dir}}/README.md`) to reflect any updates you've made to the MLOps Stacks repo.
84+
For example, you may want to include a link to your custom MLOps Stacks repo in `template/{{.input_root_dir}}/README.md`.

Diff for: template/update_layout.tmpl

+5
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,11 @@
6161
{{ skip (printf `%s/%s` $root_dir `docs/ml-developer-guide-fs.md`) }}
6262
{{ end }}
6363

64+
# Remove utils if using Models in Unity Catalog
65+
{{ if (eq .input_include_models_in_unity_catalog `yes`) }}
66+
{{ skip (printf `%s/%s/%s` $root_dir $project_name_alphanumeric_underscore `utils.py`) }}
67+
{{ end }}
68+
6469
# Remove template files
6570
{{ skip `update_layout` }}
6671
{{ skip `run_validations` }}

Diff for: template/{{.input_root_dir}}/.azure/devops-pipelines/{{.input_project_name}}-bundle-cicd.yml.tmpl

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# This Azure Pipeline validates and deploys bundle config (ML resource config and more)
2-
# defined under {{template `project_name_alphanumeric_underscore` .}}/databricks-resource/*
3-
# and {{template `project_name_alphanumeric_underscore` .}}/bundle.yml.
2+
# defined under {{template `project_name_alphanumeric_underscore` .}}/resources/*
3+
# and {{template `project_name_alphanumeric_underscore` .}}/databricks.yml.
44
# The bundle is validated (CI) upon making a PR against the {{template `default_branch` .}} branch.
55
# Bundle resources defined for staging are deployed when a PR is merged into the {{template `default_branch` .}} branch.
66
# Bundle resources defined for prod are deployed when a PR is merged into the {{template `release_branch` .}} branch.

0 commit comments

Comments
 (0)